Tutorial 7: Prediction Models

The below set of models aim to predict student success in the DDD course, based on variables including the number of assessments the student took (delivered), average score on those assessments (avg_score), whether the assessment was late (sum_delays), the total daily clicks the student made in the course (total_clicks), the average daily clicks the student made in the course (average_daily_clicks), the total daily elements the student interacted with (total_elements), the average daily elements the student interacted with (average_daily_elements), the total days the student was active in the course (active_days), and the students gender, geographic region (region), highest level of education (highest_education), Index of Multiple Depravation for the place the student lived (imd_band), age range (age_band), and whether they declared a disability (disability).

Classification Models: Pass/Fail

Decision Tree - Pass/Fail

This model examines the likelyhood of a student passing or failing using a decision tree. This model is approximately 74% accurate, and predicts the critical elements of a student's success to be their average score on assessments, the total elements they interact with in the course, and the number of assessments they completed.

Loading...
Loading...
Loading...

Logistic Regression - Pass/Fail

This model examines the likelyhood of a student passing or failing using logistic regression. This model is approximately 74% accurate as well, and predicts the critical elements of a student's success to be their average score on assessments, the total days active in the course, the total clicks in the course, and the student's highest level of education.

Loading...
Loading...
Loading...

Random Forest - Pass/Fail

This model examines the likelyhood of a student passing or failing using a random forest model. This model is approximately 76% accurate, and predicts the critical elements of a student's success to be their average score on assessments, the total elements the student interacted with, the total clicks in the course, and the total days active in the course.

Loading...
Loading...
Loading...

Classification Models: Dropout

Decision Tree - Dropout

This model examines the likelyhood of a student dropping out using a decision tree. This model is approximately 70% accurate, and predicts the critical elements of a student's success to be their declared disability status, their average score on assessments, and the total number of assessments they turned in.

Loading...
Loading...
Loading...

Linear Regression - Dropout

This model examines the likelyhood of a student dropping out using linear regression. This model is approximately 72% accurate, and predicts the critical elements of a student's success to be their declared disability status, and their average score on assessments.

Loading...
Loading...
Loading...

Random Forest - Dropout

This model examines the likelyhood of a student dropping out using a random forest model. This model is approximately 72% accurate as well, and predicts the critical elements of a student's success to also be their their average score on assessments and their declared disability status, as well as their total active days in the course and their total clicks in the course.

Loading...
Loading...
Loading...

Regression Models: Final Grade

Linear Regression - Final Grade

This model attempts to predict a student's final grade based on the variables listed at the top of this report using linear regression. The R Squared value for this model is 0.26, which is quite low, indicating low accuracy in the model. Additionally, the Actual vs. Predicted Values chart shows that the linear pattern is weak.

Loading...
Loading...

Random Forest Regression - Final Grade

This model attempts to predict a student's final grade based on the variables listed at the top of this report using random forest regression. The R Squared value for this model is 0.27, which is only very slightly higher than the linear regression model, indicating low accuracy in the model. Additionally, the Actual vs. Predicted Values chart shows that the linear pattern is weak (though perhaps slightly stronger than the linear regression). This model indicates a student's average score on assessments as the most important element in predicting their success.

Loading...
Loading...
Loading...

Model Comparison

In comparing all of the models listed above, the classification models seem to be much stronger at predicting student success than the regression models. This could be because predicting whether a student will pass a class (or whether than will drop out) is a much wider range than predicting a specific grade value at the end of the course.

Within the classification models, the random forest model is the most accurate at predicting student success. This is true in the regression models, though both differences are quite slight. Therefore I would choose to use the random forest algorithm, either as a classification model or a regression model. Deciding between these two would depend on my goal in providing intervention, however, I would be reluctant to use the regression model because it is not very accurate.

Instead, I would use the random forest classification model, specifically for the pass/fail prediction. This model has the highest accuracy level (76%) and also provides an illustration of the variables that are critical to student success. Though average score on assessments is the best predictor of success, the model provides other variables that the instructor could act on to intervene between assessments if necessary (primarily interaction with the course, such as clicks and active days).

The model is repeated below for reference.

Random Forest - Pass/Fail

Loading...
Loading...
Loading...