University Ranking Analysis

Hypothesis

My hypothesis is that Top Ranked Universities have High Income since High Income means:

  • The schools can hire high profile people for their faculty staff
  • The schools can purchase cutting age experimental equipments, which are crucial for science, technology, engineering and math researches and education.

Definition of Top Schools

In this analysis, I created a column isTop20 for Top 20 schools and defined these 20 schools are the “Top Schools”. Also this data contains historical data, but keep focus on the latest (year 2016) score for this analysis.

PCA to find characteristics among schools

First I’d like to see Schools characteristics so let’s check it with PCA.

When you do PCA and assign isTop20 column to color, you can see top 20 schools are all placed at the left hand side ends which represents strong Research and Teaching scores. So it seems Research and Teaching are two top components to characterize the top schools and Income is unfortunately not working as I expected.

Loading...

Variable Importance

So let’s confirm this findings with Random Forest and assign this top20 column as target variable.

Loading...

So we can confirm that the “Teaching” and “Research” are actually top 2 important variables for Top Schools but “Income” is not statistically significant.

Regional Difference

Let’s check if this applies to all regions (Americas, Asia, Europe, Oceania) since I’m hoping breaking it down to by region could reveal the Income become meaning full variable. What interesting here is we don’t have much top schools in Asia and Oceania so cannot see so It seems only shows Americas and Europe. And unfortunately in either case, my hypothesis was wrong..

Loading...

So at this point, I noticed that Americas International is high for Americas region. So for Universities in Americas region.

So what is related to international?

To answer this question I quickly run Random Forest and set “International” as target.

Loading...

So from this result, it seems “International_Students” is related to “International” variable. Not sure the implication of it but it could be because faculty from some country tend to attract students from that country.

Summary

Top 20 Schools for year 2016 are attributes to mainly by Research and Teaching. But if you look into Americas, International plays important role where it does not for Europe. And it seems there is correlation between this “International” and “International Students” but hard to tell the implication only with this data.