The four datasets contains all data in the two readings.
The Annotations details dataset contains the text of submitted comments, the type (comments or questions), the automatic score, the submission and edit time, the amount of replies and upvotes received and also other deatils about the comments. It is a detailed description of annotations in these two readings.
The annotation dataset contains viewing and active reading time columns, which shows how long students spend on the readings and how active they are.
The page dataset contains how many views there are on each page.
The time dataset contains when annotations are posted and it is separated by each hour.
I think the views and time chart seem to be correct. From my perspective, I spend more view at the beginning of reading process, while I may rapidly browse the pages at the end. Threfore, the views reduce rapidly in this process. To be more specific, these readings are academic papers, so they include summary, introduction, process, discussion, conclusion and reference, etc. Therefore, the peaks on the views chart probably are related to summary, process, and conclusion, where students spend more views on them due to the importance.
Talking about the time chart, it is true that I may spend more time on some pages than others, probably because these pages are more complicated. However, the average time of reading these pages is in the same level and it proves due to the linear regression. The peaks show that students spend more time on these pages. There are some reasons about these peaks: Some pages may contain few paragraphs and some graphics and students spend less time on reading, while others contains complicated terms and many paragraphs, which requires more time to read. Also, these pages contains many annotations and comments, which students spend a lot of time to read and reply them.
It seems that the length of the reading affected the level of attention. Wise's paper is much longer than Selwyn's one and students spend less time on Wise's paper than the other one. We can see that through linear regrssion. In Wise's paper, the trend of line is reduced, while in Selwyn's paper, the trend is stable with a slight growth. Therefore, there are differences in the trend line of time.
Accoding to data, it can be seen that the attention-span measured in pages is about 5 pages. It is shown that students have a peak of reading time every 5 pages (which is more obvious in Shelwyn's paper).
Talking about the ideal length of a reading, I think15 pages are much better since students views fewer after they have already read 15 pages. After 15 pages, students may have another high peak in reading, which is suitable for starting another reading.
However, I think the data is not exactly accurate. First, the complexity of the two readings is different. Students may need more time to understand the context in some readings, but it can not be shown only by the views and time. Second, the topics of the two readings are also different, which may attract students' interest or not.
It can be seen that students students have more activities after 9 p.m. and between 1 p.m. and 3 p.m. Also, students are more active on the last day ahead of the due.
The recommended pattern seems not be quite obvious since most students start annotation on the last day.
It seems that students start reading after the Friday and do not finish it until the last day. I wonder whether they have enough time to read the paper on the last day.
Actually, this pattern represents my activity. I always start reading on the last few days and sometimes even on the last day.
I think students are apt to start to read the paper late, so it could be found that there are more activities at the last day's night. However, Wise's paper might be more readable for the students who are not familiar with this field, since there is an introduction to terms and background, while Selwyn's work is more theoretical and students may have fewer words to make comments.
The results implicated that students are more intersted in Wise's paper and perhaps more information based on Wise's one can be discussed in class. The results also remind me of my behaviour. Perhaps it is too late for me to start reading on the last day. Therefore, I need to start it earlier in the week.
It can be found that students post more in Wise's reading. Also, in Wise's reading students were more active rating other students’ comments. It seems to be a significant difference thorugh the observation of graphics, but more statistical calculation are needed to prove it. It seems to be opposite since most comments in Wise's reading are made late, but there are more comment and question upvotes from the students in Wise's reading than Selwyn's.
When talking to my behaviour, it is true that I only make comments and upvote some students' comments. Also, there is no upvote received from others. The reason is probably because I start to read late at the moment other students have already finished reading. Waht's more, I find that I spent more time reading Selwyn's paper and leave more comments due to enough time.
For Selwyn's reading, I think Athia, Bebe, and Stephanie are the best three contributors since they have the biggest length of texts. For Wise's, I think Athia, Maria and Bebe are the best contributors due to the same reason.
I think the sum of rows(parts) are not a good proxy of quality, since students may have only four or five comments with a long text. The average and sum of length can be considered as a metirc for contribution.
When using AVG, students who have a large amount of text but many rows may have a low number. Those students who have a long text per comments are in the higher rank. However, when using SUM, students who have a longer text in total are in the higher rank.
I think the AVG and SUM of length can be integrated into the model of contribution estimation. Those who has a higher average and sum of length and average of upvotes, replies can be considered as a high quality of contribution estimation. Nevertheless, researchers might also consider the weight coefficient of each metrics in this process.