Google Playstore Analysis

I will focus on install numbers for this analysis.

Check distribution

In general, comparing with free apps and paid apps, the install numbers distribution might differs so will check it first.

Loading...

With this comparison, it could be said install number distribution might differs between these two types of apps. So I will separate these analysis.

In addition, checking the price distribution for paid apps, seems we can make some groups with the price.

Loading...

So I will create bin with k-mean method like below in advance

Loading...

Check variables impacting install number

Next I will run Voluta to check impact of each varibles for each groups. With this reuslt, it could be said that “Reviews” are showing the highest importance for each.

Loading...

However, “Reviews” might be the answer for install numbers, so I will check coefficiency next.

Check “Review” coefficiency

Loading...

It seems there is a strong coefficiency between Installs_number and Reviews but a bit hard to tell whether this is a perfect collinearity or not. So will proceed excluding Reviews and run Voluta again so far.

Run random forrest excluding Reviews

Loading...

Excluding “Review”, now there is no confirmed variables but some tentaive variables. So will try to run linier regression models to see if we can find some causality.

Run linier regression

Loading...

Running linier regression, it was hard to find significant variables for Paid apps. (Think neutral and expensive group did not show up beacause of data amount)

On the other we can find that for free apps, Sentiment_Subjectivity_mean might have positive impact on install numbers and for Sentiment_Subjectivity_mean have negative impact on install numbers. However, this is hard to interpret especially when we talk about causality.So I would like to move to Generes. For several genres have negative impact for installs number from Genres_Tool.

From this insight, it could be said that when someone plans to develop apps and wanted to maxmize the installs number, there would be several generenes that could easily earn installs number.