1

I have an Excel model which predicts the number of customers for a given month. The prediction depends on a churn rate. I have the absolute error (actual vs predicted), along with squared error and sum of square error.

My question is:

Would it better to find a churn rate that minimizes the absolute for each period (year, month) or find a churn rate that minimizes the sum of squared errors? Does the former even make sense to do?

enter image description here

alexsmith2
  • 161
  • 1
  • 1
  • 4

1 Answers1

0

If you minimise absolute error, you are implicitly assuming that your errors are Laplacian distributed and if you minimise mean-squared error, you're implicitly assuming they're normally distributed.

Due you have any reason a-priori to believe one of the other? Are your tails long? Pedantically, users are discrete, so are either appropriate ? If the numbers are quite large, continuous isn't a bad approximation but if your numbers are small, you might consider a discrete distribution leading to an entirely different distribution.

A-priori arguments aside, train a model using least squares, and then plot a histogram of your prediction errors $y_{i} - \hat{y}_{i}$. Does this distribution look normal? Train the model using absolute error and plot $y_{i} - \hat{y}_{i}$, does this distribution look Laplacian?

gazza89
  • 1,734
  • 1
  • 9
  • 17
  • 2
    The conclusions you draw about "implicitly" themselves make the implicit assumption that you are using maximum likelihood. As @Richard Hardy suggests in a comment to the question, it's often more appropriate to consider the loss function rather than exploring distributional assumptions. – whuber Nov 28 '18 at 20:23
  • 1
    yes, that's fair. Would your default approach not be to use max-likelihood or max posterior unless you had a good reason to use something else? – gazza89 Nov 28 '18 at 20:33
  • 1
    My default is to explore the loss with the client in an effort to gather information relevant to deciding on an appropriate procedure. – whuber Nov 28 '18 at 20:35
  • @whuber The choice of a loss function depends on the goal of modelling, the structure of residuals, and enough different considerations that I would have a hard time giving anyone explicit instructions as to how to do that. Even a slight change in modelling goals can imply different loss functions and results. Not a simple topic, then. – Carl Dec 18 '21 at 18:47
  • @Carl I believe you might be conceiving of "loss" a little differently than I am. In decision theory the loss is a primary consideration before any modeling is even done. It measures the cost of suboptimal decisions. That's not necessarily the same thing as the "loss" used for fitting models, although the concepts are very close. The statistician, though, is not free to impose a different loss function on their client solely because "modeling goals" might have changed. – whuber Dec 18 '21 at 18:58
  • @whuber "Impose" is a tad presumptuous here. How would you know, for example, that a problem is ill-posed, serendipity perhaps? And which would be the appropriate variance reduction goal for regularization, guesswork perhaps? Next, consider what is more costly, fitting the tail of a function incorrectly or the bulk of it? Or perhaps neither? I wish it were as simple as you are implying, I really do wish so. – Carl Dec 18 '21 at 19:13
  • @Carl It sounds like you and I are discussing completely different things. A statistical decision problem isn't even posed at all until a loss or class of loss functions has been specified. Depending on the loss, "variance reduction," tail fitting, etc., could be irrelevant. I'm sorry this might be coming across as simple, because it's anything but simple. – whuber Dec 18 '21 at 22:07
  • @whuber Essentially, all models are wrong and our job is finding models that are more useful. Doing that means simultaneously comparing models paired with loss functions that prove to be appropriate to each model and not just comparing loss functions in a vacuum. – Carl Dec 18 '21 at 23:03