Tutorial 7 Prediction Models

Classification models

Decision Tree for Passing

Loading...

The result tell us that if, in the first 50 days of the course you have an average score higher than 67, you visited more than 85 elements in the LMS and delivered less than 2.5 assessments you have a 85% chance of passing the course (57% of the population comply with this characteristics). On the other hand if your average score is less than 67 in the first 50 days, you only have a 0.36 of Passing (28% of the training set was in this situation).

Loading...

We see than in the test set, 21.31% failed the course, and the model predicted that they will fail. Also, in the test set, 53.11% passed the course, and the model predicted that they will pass. In total, the model is 21.31+53.11 (74%) accurate, that is, it fails for around a quarter of the cases. Not good, not bad.

Loading...

Here we can see that the performance in the test and training sets are similar, indicating a consistent model that is not overtrained on the training data. The accuracy rate is the one that we have calculated before (76% for the training set, 74% for the test set).

Loading...

We can see that your score in the first 50 days is highly predictive of your passing or failing, while your disability status is not so much.

Decision Tree for Dropout

Loading...

now we see that disability, while not important to pass or fail the course, it is actually important to being able to finish the course.

Loading...

The prediction matrix tell us that there are errors, but they stay below the 30% for the test set.

This model is a little less accurate with a 70% accuracy:

Loading...

Logistic Regression for Passing / Failing

Loading...

We can see that the performance is similar to the decision tree, however there are more Type II errors and less Type I errors. The summary tell us that the model is 75% accurate. Just 1% more than the decision tree.

The model also let us know what are the important variables (avg_score and active_days).

Loading...
Loading...

Logistic Regression for Dropout

Loading...
Loading...

Now the model behave much better. Again disability and avg_score seem to be the most important variables to predict dropout. This model it is a little bit more accurate (72%) than the decision tree.

Random Forest for Pass/Fail

Loading...

The results seem better. However, a look at the summary tell us that the accuracy is just marginally better (76%).

Average score and total elements used seem to be the main variables.

Loading...

Random Forest for Dropout

Loading...

It seems that we have slight better performance. The summary says that the model is 73% accurate in the test set.

As we can see all the models have a similar performance, the inaccuracy if due mainly to some missing information.

Loading...

Regression Models

Linear Regression

Loading...

We can see that the model is not good at predicting actual grade. The R Square is low (0.26), that means that the variables cannot explain the variability in scores. The error is high +/- 16.24 points in average (RMSE).

Actual/Predicted the graph is not a clear line.

Loading...

Random Forests

Loading...

The R Squared is still low (0.26), meaning that the variables are not able to explain the variability of the score.

avg_score is the main variable to predict the final score.

Loading...

Conclusion

The models show similar accuracy, around 74–76%, with random forest performing slightly better. Average score in the first 50 days is the strongest predictor for both passing and dropout. Disability status has little impact on passing but helps predict dropout. Regression models were less effective for predicting final grades. Overall, early performance and engagement are key indicators of student success.

Export Chart Image
Output Format
PNG SVG
Background
Set background transparent
Size
Width (Pixel)
Height (Pixel)
Pixel Ratio