Introduction

In this report we will look at how well certain predictive models worked in predicting the final results (pass or fail), dropouts and final scores for a given data set. In all cases the same set of predictor variable have been used.

Final result (pass/fail).

The first model used was that of a decision tree.

As we can see the decision tree test could predict the final result (pass or fail) with an accuracy rate of 74.42%.

The second model used was that of Logistical regression where the target variable had been set to that of "Pass".

As we can see the Log Reg model test could predict a "Pass" with an accuracy rate of about exactly the same as the decision tree, 74.42%.

The third predictive model used was the random forest model.

As we can see the random forest model could predict a pass or fail with an accuracy rate of 75.65%.

Dropout

For predicting dropouts, the first model used was the decision tree model.

As we can see the model can predict a Dropout with an accuracy rate of 70.24%.

The second model used was the Logistical Regression model.

As we can see the model can predict a Dropout with an accuracy rate of 71.72%.

The third model used was the Random Forest model.

As we can see the model can predict a Dropout with an accuracy rate of 71.62%.

Final Score

The first model used in predicting the final score was that of linear regression.

As we can see the model has a R Squared value of 0.252 meaning that about 25.2% of the variability in the final scores can be explained, which is not great at all.

Finally a random forest model was used predicting the final scores.

As we can see the R Squared value has improved sligtly to 0.273 meaing that 27.3% of the variability in the final scores can be predicted by the the model. Again, not great.

Conclusion

For predicting the final result (expressed as Pass/fail) I would choose the Random Forest model as it has yielded the higest accuracy rate. (Although it is very marginal).

For predicting the dropout I would choose the Random Forest model again as it has has again yielded the higest accuracy rate. (Again, even more marginal this time. It was 0.1% more accurate than the Random forest, so it does not reall matter which one you use.)

For predicting the final score I would not use either of the models (Regression or Random Forest), as both are terrible at predicting the final score. And what good would predicting the final score do in any case?

Rather , the smartest thing would be to use the dropout predictive model to identify who, according to the model, will be dropping out, and figuring out why? And then stepping in or setting up an action plan to prevent that from happinging. In the decision tree model for dropout (below) we see that if you have a disability you have about a 30% chance of dropping out. There must be a reason for that.

And then finally the final result predictive model can be used to intervene when a studnet has been earmarked to fail, after a certain point. This prediction can then be corrective with approriate intervention and remediation.