Report: Comparing Prediction Models

Decision Tree for Passing

Loading...

The Decision Tree above shows us that if your average score in the first 50 days is above 67, you visit more than 85 LMS elements, and submit fewer than 2.5 assessments, there’s an 85% chance of passing the course. Conversely, an average score below 67 in the first 50 days results in only a 36% chance of passing. This model is easy to interpret and has a 74% accuracy rate. Student test scores was the main variable for predictive analysis.

Decision Tree for Dropout

Loading...

The Decision Tree above for student drop-outs has a lower accuracy rate of 70%, but also gives us insight that other factors like disability plays a role, as shown below. Coupled with the pass/fail decision tree, this information can inform learning design that should be implemented to accommodate those students.

Loading...

Logistic Regression for Passing / Failing

Loading...

This model calculates the predicted pass/fail rate for students based on the same variables as the decision tree. This model is easy to implement with less complex information, as logistic regression requires a binary outcome.

Logistic Regression for Dropout

Loading...

The Logistic Regression model for student drop-out is 2% less accurate than the Decision Tree. Like the Decision Tree, it is still able to identify student’s average scores and disability as important factors.

Random Forest for Pass/Fail

Loading...

Like the other models, the Random Forest identifies the total elements and average score as the most important factors for pass/fail. This model has the best accuracy rate of 76%. This model would be the best choice for when high prediction accuracy is critical for identifying at-risk students for intervention programs.

Loading...

Random Forest for Dropout

Loading...

This model has an accuracy rate of 73%, slightly better than Logistic Regression and just below the Decision Tree. The inaccuracy is likely due to missing information. If the dataset is small or the dropout rates are extremely imbalanced, Random Forest might not perform optimally without additional pre-processing.

Linear Regression for Student Grade Prediction

Loading...

The Linear Regression model is the not very accurate at predicting the actual grade. The actual/predicted graph below is not a clear line. The student data is seemingly too complex to be effectively analyzed by this model.

Loading...

Random Forest for Student Grade Prediction

Loading...

The Random Forest model is also not very accurate for predicting the student’s grade. This model does a better job than Linear Regression with bigger data sets by averaging results of multiple decision trees. The actual/predicted graph below more closely resembles a line, but not by much.

The average score is still the most important factor in this analysis.

Loading...

Conclusion

Logistic Regression is efficient for calculating low-dimensional data, such as pass/fail rates. The Decision Tree is more accurate for non-linear analysis like the drop-our rate, and does a better job of highlighting important factors like disability.

For interpretability, simpler models like Logistic Regression of Decision Trees would be best and easier to explain to stakeholders. Decision Trees would also be appropriate in this case for exploratory analysis.

The Random Forest model is the best for analyzing bigger data sets or missing values. Other models like Logistic Regression, Linear Regression, and Decision Trees are more sensitive to the data quality.

Understanding the strengths and limitations of different predictive models is crucial for selecting the right tool to analyze educational data effectively. It allows stakeholders to make informed decisions about interventions, such as identifying at-risk students or improving academic performance. Choosing which tools to use requires an understanding of computational resources and analytic needs, such as accuracy and interpretability.

Export Chart Image
Output Format
PNG SVG
Background
Set background transparent
Size
Width (Pixel)
Height (Pixel)
Pixel Ratio