University Ranking Data

I would like to check if there are regional patterns.

Look into the day

Ok. Before starting, I will make data clear. Will make data clean like below.

Let’s start looking data distribution at first. It seems university in the list has increased in 2012 and 2016. Currently I am not sure what this means but will note about that.


Next, I’ll check ocupation for university number by country and year.


Seems some specific country like university from US and Uk are listed a lot in this ranking. Additionally, several new countires are showing up in 2016’s ranking. Again, I am not sure what this means but will note about that too.

Then I summarized several countries to contigent level. Found that asia is increasing in 2016. May be this is implicating something happening in asia.


Check varialbe importance to predict continent

Anyway, I want to know about the difference between contigent so will check variable importance to predict contigent. While doing this, I excluded ranking information and total score because I am not intersted in those variables. In addition, I excluded international score because it had a strong correlation between international_student_ratio that I made in the prep step. And for last, in case of trend changing, I repeated by year too.


With this, I confimred that thre are less differences between year so proceed focusing on the model without repeating and look into four varibles whose importance seems higher than others.

Visualizing variables with boxplot whose importance seems high

Seems the variables I focused on are not a long tailed one, I just normazlized all the number and used box plot for visualization.



With the box plot above, it could be confirmed that SA which means South America has especially lower score for international student and citiation. In addtion, Africa and Asia university’s has less international student ratio and Asia’s female ratio is lower compared with other countries. Something odd for me is female ratio for Africa is a higher than I expected.