1

Suppose I have a group of users of a paying app and I want to predict each month the users that are not going to renew their subscription. This is called churn rate. To do that I create a binary classification model that, based on individual features of each user like usage time, device used, etc.. will calculate the probability of renewing/not renewing.

Would it be a valid approach to use this model on each user of my app and calculate this proportion based on those individual predictions?. Or Would it be better to build a model that tries to predict said churn rate (%) for the month, instead of aggregating the individual predictions as in the first approach (similar to a time series prediction problem)?

My understanding is that using the individual predictions is not going to have the same effect. First I have to choose a probability threshold to classify an instance in a positive or negative instance. But the criteria to choose that threshold does not have to be aligned with predicting the right proportion of positive classes

Brandon
  • 652
  • 5
  • 13

1 Answers1

1

You do not have to choose a threshold, and I am unsure why you think you have to. (Obligatory link to my fundamental doubts about thresholding.)

Suppose you have $n$ users, each with a predicted probability of $p_i$ of churning. Then the total number of users you expect to churn is simply $p_1+\dots+p_n$, and the total expected churn proportion is $\frac{p_1+\dots+p_n}{n}$.

Yes, of course you could also model and predict either one of these numbers separately (possibly accounting for the fact that the numbers will not be below $n$ or above $n$, respectively $1$). Either this "top-down", or a "bottom-up" approach may yield better predictions. Or you might use the "optimal reconciliation" approach to hierarchical forecasting to combine both predictions, which I usually find to yield best predictions across the board.

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
  • 1
    That makes sense. I've actually read your post about thresholding some time ago and it was really informative. I tested averaging the predictions and it's more in line with the true churn rate. So not choosing the threshold helped. However, isn't predicting the churn rate month after month a time series problem? Does it make sense to solve it with this "individual predictions" approach? – Brandon Nov 03 '21 at 21:24
  • Yes, this is a kind of a time series problem, so it would make sense to use some time series algorithms for the "top" predictions - or potentially some kind of survival analysis approach. For the "bottom" single-user predictions, this does not make quite so much sense, because mostly, churning is a single event for each user. – Stephan Kolassa Nov 03 '21 at 21:28