Predicting Student Success with Leganto: a “Proof of Concept” machine learning project
Linda Sheedy, Curtin University
Early adoptors of Leganto as a reading list solution in 2015 – mainstreamed in 2017. Now 4700+ reading lists with 115,300 citations viewed 1.5million times by 42,000 students.
Ex Libris proposed a proof of concept project “to use machine learning to investigate the correlation between student success and activity with the Leganto Reading List”. Curtin had already been active using learning analytics so thought it would be a good fit.
Business need – early prediction (within 1-6 weeks) of students who’ll most likely struggle with their course.
- student profile, grade and academic status data from Curtin – took significant time and effort to produce this, and inter-department work. Course structure and demographics are complicated.
- Leganto usage from Ex Libris
Lots of work also combining the datasets.
Function: Ex Libris considered a number of possible algorithms – currently seems to be settling on the Random Forest algorithm but the final outcome may be a two-stage model.
So far Semester 2 2016 – Semester 2 2018. So far the algorithm has found the following features are most predictive:
- student historical average grades
- historical usage engineered feature
- student age
- student usage in week 1 in relation to class
weighted student usage per course
Model total accuracy is 91.9%
Recall: it catches 18.8% of students at risk
Precision: 69.44% (ie for 10 students predicted at risk, 7 actually will be) – considered high
The model clearly needs more work – but increasing recall shouldn’t be at expense of precision. More data may help along with more tweaking of algorithm.
Project has concluded; not sure where Ex Libris will take the project next or whether it’ll become a Leganto offering.
Q: What intervention did you take if any?
A: Just a closed project, all anonymised – just to see if it’d work – so no intervention during this project.
Q: Was demographic data other than age included?
A: The algorithm found itself that age was a major predictor (other demographic data was included but algorithm didn’t find it to be predictive of success).
Q: How was analysis improved?
A: At start of project hoped to prove that students would succeed if they read more. But as it went on it shifted to seeing what predicted when students would struggle.