Google Play Store Paid Apps Analysis

In this analysis, I want to check what variables have correlation with App Sales on Google Play Store. (So it’s exclude free Apps)

Which variables have impact on App Sale?

Before start the analysis, let’s create columns required for the analysis first.

Sales

Let’s create a Sales column as Prce * Installation. So we use this as “Target” variable.

Days_After_Upgrade

Let’s create a Sales column as as.numeric(today() - mdy(`Last Updated`), units="days"). I created this column to see if recency (recently updated) has something to do with Sales.

Random Forest (Variable Importance)

Create a Random Forest Analytics with Boruta and exclude Installs_numbers column since this is almost as same as the “answer (i.e. Sales)” we want to know.

Loading...

So it seems Review , Price, Current_Ver, and Days_After_Update have correlation with App Sale.

Correlation Among Columns

If you check the correlation among variables, Reviews and Installs_numbers have high correlation (0.8), and it makes sense since writing a review for the App means most likely the reviewer installed the App and used it.

Loading...

Random Forest (without Reviews column)

So if I exclude this Review from Variable and try the Random Forest Analytics again, now we can see Current_Ver, Sentiment, Price have statistically significant correlation with the App Sale. So sentiment is the review sentiment and it’s either positive or negative. It makes sense sentiment correlate to Sale.

Loading...

Random Forest by Category

So my next question is, does this apply to all the App Categories?

To check it for each category, remove “Category” from variables and assign Category to “Repeat By” and here is the result.

Loading...

What interested here is each Category shows different pattern. For example, “SPORTS” and “PERSONALIZATION” categories “Price” has “Confirmed” correlation but not for “MEDICAL”. And “Content Rating” has correlation with Sales for “MEDICAL” category.

Summary

So first, I thought recency (recently updated) has something to do with the sales. But if I look into the result, Days_After_Update does not have impact on all the category but has impact on some of the categories like Sports and Medical.