The following four boxplots show the data for the four variables: the concepts and equations scores for the week 1 quiz, followed by the concepts and equations scores for the week 3 quiz. Each of the variables have similar ranges and spreads. There do not seem to be any significant outliers in the data.
Before exploring additional analysis, the variable with the lowest median is the week 1 equations score. Additionally, the median equations score changes more between weeks 1 and 3 than the concepts score. This could suggest that students start out with more prior knowledge of chemistry concepts than they do with equations.
Examining the scatterplots of each of the variable comparisons, it seems that there are four clusters in the data. These clusters are most defined when comparing the equations scores between the week 1 and week 3 quizzes and comparing the week 1 concepts and equations scores.
This aligns with what can be observed from the boxplot distributions. There seems to be a cluster of students who scored low on the week 1 equations questions that then improved on the week 3 quiz (Cluster 1, in blue). This cluster blends more with Cluster 2 on the other two scatterplots, perhaps because they had similar prior knowledge of chemistry concepts to that cluster but much less prior knowledge of equations.
Indeed, looking at the k-means clustering analytics, Cluster 1 received lower scores on the equations section of the week 1 quiz than the rest of the variables.
Looking at the elbow curve for the data, 4 clusters seems to be an accurate number. After 4, the curve levels significantly. This improves my confidence in choosing 4 clusters.
Clustering data in this way has the potential to have both positive and negative impacts on the learning environment.
Specifically for the equations section of the course, clustering shows that one group (Cluster 1) had to spend more time on learning equations during the first two weeks of class, but they also seem to have done this effectively. Without the cluster analysis, perhaps the instructor would see that the median score for equations had met the other scores by week 3, and would thus consider the class to be all on an even playing field. However, looking at the clusters, the instructor can see that Clusters 1 and 2 are scoring higher (by week 3) than Clusters 3 and 4. The students in Cluster 1 are the reason that the equations score rose so much. Clusters 3 and 4, however, are still scoring lower than Clusters 1 and 2. Now the instructor can look more into Clusters 3 and 4 to see how they can be brought to the same level as Clusters 1 and 2.
There could be negative implications of the clustering analysis as well. Included with the risks of stereotyping that come with reducing students to aggregate groups is the risk that the instructor will overcorrect to spend too much time differentiating instruction for lower performing students. With enough time and resources, an instructor should be able to differentiate for both higher and lower performing students, providing different levels of activities and instruction to keep both groups engaged. However, if the instructor shifts focus to only targeting the lower performing group in class, the higher performing group could lose engagement and motivation. This balance is something that instructors need to keep in mind.