• Assignment 1 Report
    • Part 1: Loading and Understanding the Data
      • 1.1 Pages Visited and Time per Visit
      • 1.2 Time of Comment
      • 1.3 Annotations per Individual
      • 1.4 Content and Details of Each Annotation
    • Part 2: Analysis of the Access to the Different Pages in the Document
      • 2.1 Using the Pages_Selwyn Dataset
      • 2.2 Using the Pages_Wise Dataset
    • Part 3: Analysis of the Annotation Activity to the Readings
      • 3.1 Using the Time_Wise Dataset
      • 3.2 Using the Time_Sewlyn Dataset
    • Part 4: Analysis of the Different Annotation Behavior between Readings
      • 4.1 Compare the Activity in the Different Measurements for the Two Readings
      • 4.2 My Own Activity in the Different Measurements for the Two Readings
    • Part 5: Analysis of the Contribution Quality
      • 5.1 Using the AnnotationDetail_Wise Dataset
      • 5.2 Using the AnnotationDetail_Selwyn Dataset

Assignment 1 Report

by Fiona Huang

Part 1: Loading and Understanding the Data

1.1 Pages Visited and Time per Visit

This dataset indicated the numbers of time visted and the average number of time spent on each page by the students.

1.2 Time of Comment

This dataset contains the information of the exact time and date where students left their annotations. It also shows the number of annotations made within each period of time.

1.3 Annotations per Individual

This dataset includes more complexed information than the 2 different datasets above. It includes the amount of time (both Viewing time and Active reading time) on each article. It also includes the various student activities: how many annotations, comment upvotes(given and recived) and questions upvotes.

1.4 Content and Details of Each Annotation

This dataset containes detailed information on the annotations students submitted and on which day and which page they left that annotation. The number of replies and upvotes are documeted. A score is also generated based on the criteria if students have submited their annotations online.

Part 2: Analysis of the Access to the Different Pages in the Document

2.1 Using the Pages_Selwyn Dataset

Number of Access per Page

510152001020304050
Page numberPage views

Trend Shown:

  • views decline towards the end of the article.

My Expereince:

  • this visualization is relatively correct, since the distribution of academic article is that: summary/introduction, content and then conclusion and references. Therefore I think it is reasonable thay there are a significant less views towards the end.
  • Accorind to the observation from my own experience, one thing is different than what the data showing is that - I usually view more times of the middle part of the article as the main concepts mostly are explained in those parts instead of the begining. In order to understand the paper better, I would go back to the middle part more to review the content.

Time Spent per Page

51015200246810
Page numberAverage view time (min)

Trend Shown:

  • Through the linear regression line, we can see that the amount of time spent on each page is relatively stable compare to the Number of Access per Page. However, from the line graph we could see that there is a big fluctuation of average time spent on each page.

My Expereince:

  • According to my experience, this reflects the actual reading situation since a lot of times the reading time spent varies depending on the following conditions:
    • if this page is filled with graphs to illustrate a research result.
    • if this page is filled with mainly text that explain the concept.
    • if this page is filled with other students' discussion on the annotation part.

2.2 Using the Pages_Wise Dataset

Number of Access per Page

510152025010203040506070
Page numberPage views

Time Spent per Page

510152025051015
Page numberAverage view time (min)

Similarities:

  • Number of Access per Page from both datasets shows the same trend where there are negative correlation between the page number the number of page views
  • The number of time spent on each page of both dataset fluctuate a lot

Differences:

  • The time spent per page on the Wise dataset has a negative correlation where the Sewlyn dataset has a positive correlation

Analysis:

  • Has the difference in length affected the level of attention?
    • From the line graphs on Page Views and Average View Time above, we could see that when it reach page number 10-15, there would be a huge decline. However for the Wise dataset, where it contains 25 pages, we could see a almost Avg 0 min view time from page 20-25. Whereas, the Selwyn, though there is a decline but it did not reach 0. Therefore, I believe that the difference in length does affect learners' level of attention.
  • What seems to be the attention-span measured in pages?
    • Every 5 pages. as there is usually a peak and a decline (a full circle) every 5 pages
  • What seems to be the ideal length of a reading?
    • 10-15 pages as both Number of Access per Page graphs shows a steep decline between 10-15 page where even though there's still a peak after page 15, it is still lower than the ones before.
  • Do you think that the source data is accurate?
    • Not entirely accurate, as the level of difficulty of the concept could also be a factor that affect the data, which is not shown on this source of data.
    • Student interest/motivation is also not included in for consideration.
    • The time spent here, is not indicated clearly if it is only the active time spent or the total time spent.
    • Students may download these article or print out these article to read and later on put their annotations online. This is also not refelcted in these datasets.

Part 3: Analysis of the Annotation Activity to the Readings

3.1 Using the Time_Wise Dataset

Wise_Heatmap

6 - 7 7 - 8 8 - 8 8 - 9 0 - 2 2 - 4 4 - 6 6 - 8 8 - 10 10 - 12 12 - 13 13 - 15 15 - 17 17 - 19 19 - 21 21 - 23
05101520Post_DayPost_Hour

Analysis:

  • When is more activity in the reading?
    • It is clearly indicated from the heatmap that there is more activities the night before the deadline.
  • Is there any problem that arise from the pattern that you see?
    • According to what is reflected from the heatmap, students might be doing the annotations to only fuful the assignment requirement instead of the purpose of learning.
    • It might also shows that students tend to procrastinate during their studies.
  • This pattern represents your activity?
    • In my opinion, this does not completly represent my activity. Though I usually plan my work time depends on the deadlines given by instructors, I often download the papers and print them out to read as I do not prefer to read on a screen.

3.2 Using the Time_Sewlyn Dataset

Sewlyn_Heatmap

4 - 5 5 - 6 6 - 7 7 - 7 7 - 8 8 - 9 9 - 10 0 - 2 2 - 4 4 - 6 6 - 8 8 - 10 10 - 12 12 - 13 13 - 15 15 - 17 17 - 19 19 - 21 21 - 23
051015Post_DayPost_Hour

Analysis:

  • How they are similar and how they differ?
    • Both datasets show that there is most activity when it's closer to the deadline.
    • For Wise's dataset, students starts 4 days before the deadline where for the Sewlyn Dataset, student stars only 3 days before the deadline.
    • For the Wise dataset, there is also another peak on the day of dealine during the afternoon.
  • Why do you think that the patterns are similar or different?
    • I think they are similar because deadlines are approaching and students would need to finish their work by it.
    • However, it is different becasue the Wise article is longer thann the Sewlyn one, that is probably why students started earlier for the Wise article
  • What implications these two results have for the class and your behavior?
    • These visualization indicate that student tend to plan their work according to the deadlines given. Studnets seem to be planning their work hour according to the extend of the workload as well, as it is shown that the longer the article, the earlier the start date.

Part 4: Analysis of the Different Annotation Behavior between Readings

4.1 Compare the Activity in the Different Measurements for the Two Readings

Measurements Accroding to Value

# of Columns:FitSync X AxisSync Y Axis
SelwynWise020406080100SelwynWiseSelwynWiseSelwynWise020406080100SelwynWise
SelwynWise# annotations# comment upvotes student gave# comment upvotes student received# question upvotes student gave# question upvotes student receivedvalue

Analysis:

  • In which reading students posted more?
    • Students posted more in the Wise reading - In Wise's reading there are in total 94 annotations while in Selwyn's reading there are only 61 annotations
  • In which reading students were more active rating other students’ comments?
    • Students gave more upvotes in the Wise reading - almost 60 upvotes were given in Wise's reading and only about 30 upvotes were given in the Selwyn reading.
  • Is there a significant difference?
    • I think there is a significant difference in the numbers, as we could see in terms of annotation/comment upvotes/question upvote, the amount of activies in Wise reading are all at least doubled the amount than in Selwyn's reading.
  • How does this compare to the time at which comments were made?
    • As it was mentioned, students has started the Wise reading earlier and worked on it for more days. This might be the reason why the number of activities are more than the Selwyn's reading.
    • However, it could also be the reason that the topic dicussed in the Wise's reading is more of students' interests.

4.2 My Own Activity in the Different Measurements for the Two Readings

Measurements Accroding to Value

# of Columns:FitSync X AxisSync Y Axis
SelwynWise01234SelwynWiseSelwynWiseSelwynWise01234SelwynWise
SelwynWise# annotations# comment upvotes student gave# comment upvotes student received# question upvotes student gave# question upvotes student receivedvalue

Analysis of my Own Behavior:

  • As it is shown in the first graph, I have left the same amount of annotations on both readings.
  • However, there is a difference in the number of comment upvoted I gave and recieved. I have received and given comment upvotes in Wise's reading but not the Selwyn reading.
  • To reflect upon my own behavior, I think it is indicated that I was having a bit difficulties as I was reading both articles. It is shown in the number of annotations that, I only reached the bare minimum of the requirement. It is also a reflection that, when I start the reading late (where other students have already left most of their comments), it becomes difficult for me to leave more annotations as a lot of times, my peers have already mentioned the concerns or ideas I have in mind. I would then not repeat the same extent of content.

Part 5: Analysis of the Contribution Quality

5.1 Using the AnnotationDetail_Wise Dataset

Pivot Table with Mean

Full Name
Part
Length
Replies
Upvoters
Score
Athia D. Fadhlina
1
302.73
0.36
0.73
1.55
Bebe Nodjomi
1
638
0.5
1
2
Brittany Hamilton
1
185.57
0.43
1.29
1.14
Fiona Huang
1
256.5
0
0.25
1.5
Kyra Williams
1
88.33
0
0
0.62
Maria Seo
1
473.83
1
3.17
2
Michelle Han
1
123.8
1.3
0.6
1
Ruobing Su
1
379
0.25
1.25
2
Shaelyn Cavanaugh
1
197
0
0.2
1.6
Sophia Chalsma
1
279.6
0
1
1.8
Stephanie Sinwell
1
436
1
1.2
2
Viktoriia Zykina
1
285.57
0.71
0.29
1.86
Yongjia Zhu
1
134
0
0
1.2
Total
1
290.76
0.43
0.84
1.56
Full Name
Part
Length
Replies
Upvoters
Score

Pivot Table with SUM

Full Name
Part
Length
Replies
Upvoters
Score
Athia D. Fadhlina
11
3,330
4
8
17
Bebe Nodjomi
4
2,552
2
4
8
Brittany Hamilton
7
1,299
3
9
8
Fiona Huang
4
1,026
0
1
6
Kyra Williams
21
1,855
0
0
13
Maria Seo
6
2,843
6
19
12
Michelle Han
10
1,238
13
6
10
Ruobing Su
4
1,516
1
5
8
Shaelyn Cavanaugh
5
985
0
1
8
Sophia Chalsma
5
1,398
0
5
9
Stephanie Sinwell
5
2,180
5
6
10
Viktoriia Zykina
7
1,999
5
2
13
Yongjia Zhu
5
670
0
0
6
Total
94
22,891
39
66
128
Full Name
Part
Length
Replies
Upvoters
Score

5.2 Using the AnnotationDetail_Selwyn Dataset

Pivot Table with Mean

Full Name
Part
Length
Replies
Upvoters
Score
Athia D. Fadhlina
1
200.33
0
0
1.33
Bebe Nodjomi
1
676.5
1.25
1
2
Brittany Hamilton
1
184.5
1
0.67
1.17
Fiona Huang
1
241.5
0
0
1.5
Kyra Williams
1
303.2
0.4
0.2
2
Maria Seo
1
520.5
0.25
1.25
2
Michelle Han
1
160.5
0.75
1
1.25
Ruobing Su
1
570
0
1
2
Shaelyn Cavanaugh
1
202.67
0.67
0.67
2
Sophia Chalsma
1
230.33
0.67
1.67
1.67
Stephanie Sinwell
1
511.33
0.83
0.5
2
Viktoriia Zykina
1
309
0
0
2
Yongjia Zhu
1
153
0.25
0
1.5
Total
1
327.95
0.47
0.61
1.72
Full Name
Part
Length
Replies
Upvoters
Score

Pivot Table with SUM

Full Name
Part
Length
Replies
Upvoters
Score
Athia D. Fadhlina
12
2,404
0
0
16
Bebe Nodjomi
4
2,706
5
4
8
Brittany Hamilton
6
1,107
6
4
7
Fiona Huang
4
966
0
0
6
Kyra Williams
5
1,516
2
1
10
Maria Seo
4
2,082
1
5
8
Michelle Han
4
642
3
4
5
Ruobing Su
2
1,140
0
2
4
Shaelyn Cavanaugh
3
608
2
2
6
Sophia Chalsma
3
691
2
5
5
Stephanie Sinwell
6
3,068
5
3
12
Viktoriia Zykina
4
1,236
0
0
8
Yongjia Zhu
4
612
1
0
6
Total
61
18,778
27
30
101
Full Name
Part
Length
Replies
Upvoters
Score

Analysis:

  • Who do you consider are the three best contributors for each one of the readings?
    • For the Wise's Reading: I think the best contributors are Athia,Viktoriia and Maria.
    • For the Selwtn's Reading: I think the best contributors are Stephanie, Athia and Bebe.
  • What do you think are good and bad proxies of the quality of the contribution?
    • I think there is obe proxy is particularly good or bad. The combination of the proxies would be good to determine the quality of the contribution. If anything, I would think that the Number of Contributions and Score are porbably the two weaker proxies as the Score seems to be only highly related to the length of the contributions.
  • Is there a difference if you use the AVG or the SUM in the visualization?
    • There is a huge difference in terms of individual's average and sum score. As we could see from the 2 pivot tables above, if we look at only the average scores, there are a lot of high score(indicated with dark blue color). However, when it is the SUM, it is a different case.
  • Propose a way to integrate the different proxies into a quality of contribution estimation
    • Since I think it is important to look at the mixture of all these proxies mentioned. I think the weight of each proxy should be determined to be included to calculate the score.
    • I personally think that evaluation should be based on the following criteria:
      • Annotation Quality (50%)
        • Using the length proxy
        • But more importantly, this score need to be determined by the instructors to see if the annotation content are in depth and reflect knowlege taught
      • Engagement with the Community (30%)
        • Using the Replies and Upvote proxies
      • Annotation Quantity (20%)
        • Using the Number of Contribution proxy