Assignment I Report

1. Loading and Understanding the Data

i. Pages visited and time per visit

This dataset shows that number of page views and average view time (min) for each page in the reading.

ii. Time of comments

This dataset shows the number of comments submitted in every one hour. Hours that did not receive any comments submitted are not included in the dataset.

iii. Annotations per individual

This dataset shows the details of the reading time and annotations made by each student, including their active engagement time, number of total comments, threads started, responses, non-questions upvotes given/received, question upvotes given/received, total annotation word count, and average words per comment.

iv. Content and details of each annotation

This dataset shows the details of the annotation content each students made. Each comment is given a unique "Comment ID". If a student replies to an annotation, the original annotation's ID will be shown in the column "In response to comment ID". Comments in the same thread will also have the same "range" column. Other details of the annotations, such as highlighted text, submission, word count, type (comment/question), score, creation time, last edit time, number of replies and upvoters, are included.

2. Analysis of the access to the different pages in the document

a. The Selwyn's Reading

The trend line above shows a negative regression, meaning that the further back the pages are, the fewer people read them. Based on my experience, this seems to be correct trend line, because students are usually more focused and concentrated at the beginning of the reading, and gradually lose their focus as they read further.

The line shows a peak at page 1, 7, and 12. The peak at Page 1 should be ignored as this is the first page every student lands when they get in the reading. The peak at Page 7 can be explained by students' fierce discussion on one of the threads debating whether data has any particular meaning itself, with a total of 4 replies. Similarly, the peak at Page 12 may also be contributed by a discussion thread on whether letting students track in virtual learning platforms conditions students to respond in certain ways, having a total of 2 replies.

The above shows a slightly positive trend line, meaning that the further back the pages are, the more time is spent on reading. This seems contradictory to the negative regression trend in the previous chart, but my inference is that this trend line is affected by the sudden high average view time on Page 14 (68 min) and Page 20 (112 min). If we exclude these two peaks and focus on the view time of other pages, it does show a moderate to decreasing trend (people spending less time on the pages further back).

My hypothesis for the reasons behind the two peaks on Page 14 (68 min) and Page 20 (112 min) is that some students may have left the reading idle for an extended period of time. On Page 14 and 20, there is only 1 and 0 annotation made respectively, showing no evidence that students had to read these two pages for a longer period time. Additionally, Page 20 is the final page of the reading which contains no meaningful content for students to read. Therefore, my hypothesis is that some students leave their reading open on these two pages for a long time, thus affecting the data.

b. The Wise's Reading

The trend line above shows a negative regression, meaning that the further back the pages are, the fewer people read them. This is similar to the trend line in the Selwyn's Reading and appears to be logical.

Except for Page 1 (which can be ignored because all students land on the first page), the chart shows a peak at Pages 6, 7, 10, and 18. These peaks can be explained by the relatively more active annotations in Page 6 (5 threads), Page 7 (4 threads), and Page 18 (4 threads). For Page 10, even though there is only 1 thread, the discussion in this thread is quite active, having 2 replies.

The above shows a moderately negative regression, meaning that the further back the pages are, the less time is spent on reading. This aligns with the negative regression observed in the previous chart, suggesting that people tend to spend less time on pages located further back.

There are apparent peaks on Page 7, 9, 16 and 19. My hypothesis for these peaks is similar to the Selwyn's reading that some students may have left the reading idle for an extended period of time. For page 7, there are 4 threads which can be a reason why students spent more time on this page. However, for page 9 and page 19, there are 0 annotations being made, showing no evidence that students had to read these two pages for a longer period time. Also, Page 19 is the reference page showing no meaningful content for students to read.

c. Similarities and Differences

i. Both readings show a trend of decreasing attention with pages further back. Though Selwyn's reading shows a slight positive trend in time spent, this could be due to outliers. Overall speaking, the longer readings seem to result in decreased attention towards the end.

ii. Peaks in attention occur at specific pages, indicating where students were more engaged. In Selwyn's Reading, peaks were at Pages 7 and 12, while in Wise's Reading, peaks were at Pages 6, 7, 10, and 18. This shows that students are usually focused on the middle and middle to later pages of the readings. There are little page views at the final few pages.

iii. Based on the trends, an ideal reading length may be within the range where attention peaks occur, i.e. around pages 6 to 12 for the above two readings. This could vary but seems to be around the middle to earlier sections of the readings.

iv. The source data appears accurate in reflecting the trends of decreasing attention towards the end of the readings. Peaks of page views align with the level of engagement in the actual reading. However, there seems to be some outliers in the average view time data. The data shows a high view time on pages that do not contain active engagement and/or meaningful content (such as the reference list page), showing that some students keeping the reading idle for a long time may affect the data accuracy.

3. Analysis of the access to the different pages in the document

a. The Wise's Reading

Most comments (5) are received at the last 2 hours before the deadline (7 Feb, 22.00-24.00) and Friday's evening (2 Feb, 16.00-18.00).
The recommended pattern of doing a first annotation by Friday and then return to do more posts on Monday is not strictly followed. There are a number of comments (5) received on Friday (2 Feb), but only 1 comment is received on Monday (5 Feb).
An apparent problem is that quite a few students are doing the readings just a few hours before the deadline (the deadline being Feb 7, 2024, 11:59 pm). Some comments are even received after the deadline on Feb 8. This indicates that a number of students are deadline fighters and even complete the readings late.
This pattern does not exactly represent my activity because I usually complete the reading at least a few days before the deadline. I made my comments on this reading on 4 Feb, so deadline fighting pattern observed here does not apply to me.

b. The Selwyn's Reading

Most comments (6) are received on 8 Feb 16.00-18.00, followed by a few received a few hours before the deadline on 7 Feb.
The recommended pattern of doing a first annotation by Friday and then return to do more posts on Monday is not strictly followed. There are a number of comments (5) received on Friday (2 Feb), but only 1 comment is received on Monday (5 Feb).
Similar to the Wise's reading, an apparent problem here is that quite a few students are doing the readings just a few hours before the deadline (the deadline being Feb 7, 2024, 11:59 pm), with quite a number of them even commented on Feb 8, which has already passed the deadline. This indicates that a number of students are deadline fighters and even complete the readings late.
This pattern does not exactly match my activity, because I completed the reading a few days before the deadline. I made my comments on this reading on 4 Feb, so deadline fighting pattern observed here does not apply to me.

c. Similarities and Differences

The two sets of data are quite similar. Both readings show a trend of most comments being received close to the deadline, and the recommended pattern of making early annotations by Friday and more posts on Monday is not consistently followed in both cases. One difference is that the Wise's reading has peaks in comments on Friday evening and closer to the deadline, while the Selwyn's Reading shows a larger peak after the deadline.

ii. The patterns are similar in that students tend to procrastinate and complete readings and annotations closer to or even after the deadline. One reason is that these two readings are due on the same day, so it is more likely for students to do both readings at the same time, explaining why the pattern of both readings are similar. Students may procrastinate due to factors such as busy schedules, other academic commitments, or simply the tendency to work under pressure, which are common reasons why students delay their work until the deadline is approaching.

iii. For the class, the patterns suggest that a huge portion of students engage with the readings and annotations closer to the deadline, which may possibly affect the quality of discussions and interactions if many students are doing last-minute work. For my behaviour, this reminds me of the importance of time management and the benefits of completing readings and annotations earlier. Reading the course materials earlier allows me to have more thorough understanding and thoughtful contributions to the content.

4 . Analysis of the different annotation behavior between readings

*I added one more step here to filter the "Average" row by using "Filter" (removing rows that contain "Average").

2. Compare the activity in the different measurements for the two readings

Students post slightly more in the Selwyn's reading, with 26 threads started (Wise: 25), 22 responses given (Wise: 20), and 48 total comments given (Wise: 45). Although for the total word count, students wrote more in the Wise's reading (3,152 words) than the Selwyn's reading (2,831 words), in general, the total number of postings in the Selwyn's readings is still slightly higher.
Students are not very active rating each other's comments in both readings. Students are slightly more active in the Wise's reading by providing 1 question upvote (0 for the Selwyn's). For non-questions upvotes, the performance of both readings is similar, with the Selwyn's reading receiving 1 more upvote than the Wise's. In general, I'd say the data on upvotes is too little to make a concrete judgement on which reading students were more active rating each other - both readings have pretty low engagement in upvotes.
There is no significant difference in the two readings. The engagement level of the two readings is about the same with similar number of posting and rating activities.
The patterns of student activity and engagement do not seem to have direct relation to the timing of when these activities occurred. The pattern in Task 3 shows that the Wise's reading has peaks in comments on Friday evening and closer to the deadline. On the other hand, despite the fact that the Selwyn's Reading shows a larger peak after the deadline, the student activities show that they are more active and engaged in the Selwyn's reading as shown by the higher active engagement time and higher number of threads/comments posted. This means that even though students tend to finish the readings close to or even after the deadline, they still actively engage with the readings. Therefore, the timing of comments does not seem to affect the student's engagement level.

c. Get your activity for both readings

My annotation behaviours between these two readings are basically similar - making 2 responses and starting 2 threads, contributing to a total of 4 total comments. The only difference is I have 1 question upvote given in the Wise's reading, potentially showing that I have more questions about the Wise's reading. I am also not used to rating other's comments, therefore the upvotes given/received are low. This shows that my interests and engagement of the both readings are about the same.

5. Analysis of contribution quality

The Selwyn's Reading:

The Wise's Reading:

For the Selwyn's reading, the three best contributors are Siqi Du, Suiyuan Zhu, and Janice Lee. For the Wise's reading, the three best contributors are Siqi Du, Emery McKinstry, and Zack Lii.
Good proxies are "length" and "word count" that show the length of contribution of each student. "Replies" and "upvoters" serve as secondary proxies showing other students' interests in the original students' comment, which can also be an evidence of how good the comment quality is. Bad proxies may include all other not selected variables such as "highlighted text", "score", "page number" etc. These variables contain other details of the comment but not necessarily implying the quality of the contributions.
For the Selwyn's reading, there is not much difference between AVG or SUM. However, for the Wise's reading, there is a little bit difference. For the table using AVG, the top three contributors are Siqi Du, Emery McKinstry, and Sarah Bunney. But for the table using SUM, Zack Lii should be included instead of Sarah Bunney. So I also look at the Relies and Upvoters. For Zack Lii, the sum of replies he got is significantly higher than other students, so I made the conclusion that he should be included in the top three.
We can assign a score to every variables, e.g. comments that are longer are assigned higher scores, every reply/upvoters the comment received are assigned a score. Then, we can calculate a weighted average score for each student's contribution (e.g. 40% for length, 40% for word count, 10% for replies, 10% for upvoters). We can then rank students based on their total scores.