Cohort Analysis - Analyzing What Makes Customers Churn with Survival/Prediction Models

Objective|What makes customer churn?

  • Compare survival curve instead of average churn rate to see what makes customers churn more. The steeper the survival curve is the more chance of churning.
  • Use survival model to see which variables have more influence on the slope of curve.
  • Create predication model to predicit whether a customer churn or not.

Get Data

Data Requirement

  • start_time
  • end_time
  • canceled
  • AddBirthday
  • StartChat
  • ReceiveFriendRequest
  • TimeLinePostLiked
  • LimitAudience
  • ClearActivityLog

Visualize Data by Survival Curve

See how customer churn rate changes over time and which product feature help customers retain longer.

Survival Curve-ClearActivityLog (single variable)

  • Customers who used this feature tend to churn more at the first 10 months.
Loading...

Survival Curve-AddBirthday (single variable)

  • Customers who used 'AddBirthday' feature tend to churn more.
  • With confidence interval, the difference becomes less significant for the large overlaps.
Loading...

Create Prediction Model - Cox Regression (Statistical Learning)

Compare influential power of multiple variables.

Prediction

See what difference each variable would make on the churn rate (cancel rate).

  • The churn rate is about 64% for those who used ‘Clear Activity Log’ feature and it is higher than the one for those who didn’t use the feature.

  • Note that the period for prediction is set to 3 months.

Loading...

Importance

  • The most influential product features are ClearActivityLog, ReceiveFriendRequest, and TimeLinePostLiked.

  • The variables with gray color are considered ‘not significant’.

Loading...

Survival Curves

See how each of the variable would make the difference on the survival curves. Note that here are predicted values of Cox Regression model, not the actual data.

  • Customers who used ‘Clear Activity Log’ feature tend to churn more than those who didn’t use the feature.

  • Customers who used 'ReceivedFriendRequest' tend to churn less than those who didn’t receive the request.

Loading...

Coefficients

See how each of the variable make customers churn by Hazard Ratio.

  • Hazard Ratio > 1.5: Customers who used ‘Clear Activity Log’ feature are more likely to churn.

  • Hazard Ratio < 1: Customers who used ‘TimeLinePostLiked' and ‘ReceiveFriendRequest feature are less likely to churn.

  • Hazard Ratio crossing 1: Variables with gray lines are not considered to make significant difference in either direction (more churn or less churn).

Loading...

Coef. (Significant)

This chart shows the most significant variables that impact survival/churn rate.

Loading...

How about customers who both ‘likes posts on their timeline’ & ‘receives friend requests’?

Survival Curve-TimeLinePosLiked (single variable)

  • Customers who had TimeLinePost liked are less likely to churn.
  • With confidence interval, the difference becomes more significant after around the 7th month.
Loading...

Survival Curve-ReceiveFriendRequest (single variable)

  • Customers who received the friend requests are less likely to churn. With confidence interval, the difference becomes significant after around the 5th month.
Loading...

Unite two columns

Survival Curve-ReceiveFriendRequest_TimeLinePosLiked (united variable)

  • Customers who performed both activities are much less likely to churn.

  • With confidence interval, ReceiveFriendRequest may have more impact than TimeLinePosLiked.

Loading...

Create Prediction Model - Survival Forest (Machine Learning)

  • There is a way to predict the survival curve (a series of survival rates over time) with Decision Tree.
  • It uses the Random Forest algorithm to create a set of decision trees and each tree predict a survival curve.
  • Random Forest takes the mean of prediction results (the survival curves).

Random Survival Forest

  • We can see a similar result as we saw with the Cox Regression model.
Loading...
  • The survival curves are slightly different from the one with Cox Regression.
Loading...
  • For Cox Regression: the order of the curves and the way the curves are declining are consistent throughout the time given constraint that the hazard ratio is assumed to be constant. It is not good at capturing ‘non-linear’ patterns of actutal data.

  • For Survival Forest: the order of curves can be different time to time and the shape of the curve is much more flexible. Machine Learning models like Survival Model tend to capture the pattern in actual data better. Note that there is no confidence interval in survival forest.