Understanding role of intercept in prediction

Question

I was trying to understand the meaning of intercepts from here

I would like to discuss about a specific example from the above link

1st example - Gym membership with activation fee (constant) and monthly gym fees is input variable.

In the example, they suggest that activation fee is the intercept and input variable is the monthly gym membership fees.

So, my question is in a prediction setting, how can activation fee alone (when X=0) help us predict whether a member will churn or not? I understand it is a constant. But can activation fee alone help in prediction? Isn't it useless?

I believe you and I discussed something like this before! The above link seems to just discuss general intercept interpretation for linear functions. In a prediction setting, you don't define the intercept that way - in a prediction setting, intercept is defined by the line of best fit (let's take linear regression), and it falls wherever is optimal for the line to get the best R-squared. So yes, the intercept doesn't have a straightforward interpretation in linear regression (besides being the predicted value when all X values are 0). — Vladimir Belik, Feb 15 '22 at 04:21

score 2 · Accepted Answer · answered Feb 15 '22 at 07:22

2

So, my question is in a prediction setting, how can activation fee alone (when X=0) help us predict whether a member will churn or not? I understand it is a constant. But can activation fee alone help in prediction? Isn't it useless?

I agree that constant activation fee seems as a quite useless feature for predicting churn. In the example, the fee was used to predict total membership fee, if you ignored it, you wouldn't be able to predict it correctly. You can try yourself: fit linear regression without intercept to the data in the example; the results would be off.

You may want to read the When is it ok to remove the intercept in a linear regression model? thread that discusses problems with linear regression models when the intercept is removed.

So answering your general question, the intercept helps to correct the model for the "base rate" and make the predictions more accurate.

When predicting the total membership fee, the base rate would be the activation fee.
When predicting churn, it would be the base churn rate that does not depend on other variables (churn rate when nothing happens).
When predicting lung cancer using "number of cigarettes smoked per day" feature, the intercept would be the rate of lung cancer in the general population, while the slope would tell you how does it change with the change in the features.

In all those cases, failing to correct for the base rate would give you predictions that are off.

However please keep in mind that the intercept is tightly coupled with other variables, so it is not "just" the global average, but the base rate corrected for other features included in your model.

answered Feb 15 '22 at 07:22

Tim

108,699
20
212
390

thanks tim. Upvoted. Can I also check do you use Lime, explainable AI solution which uses linear models to explain predictions made by ML models? – The Great Feb 15 '22 at 07:49
Let's say my outcome variable is `79` My intercept is `58` and my input variables account for the remaining `21`. If my input variables are really good/have better predictive power, am I right to expect the base rate to go down (ex: from 58 to 20?). So, my input variables are able to explain the outcome variables better (more than the intercept)? – The Great Feb 15 '22 at 08:01
Additionally, just trying to understand. how do you think in real time, activation fee alone can help predict the gym membership fee? I am not sure whether activation fee provide enough info/insight to predict total membership fee. May be I am understanding incorrectly. can help me understand in layman terms? – The Great Feb 15 '22 at 08:02
@TheGreat As said, you cannot consider the intercept independent of the rest of the model. With intercept alone you are predicting the total fee for the person who spend 0 months on the gym, in such a case it is exact, otherwise it is not, you don't use it alone. – Tim Feb 15 '22 at 08:18
Thanks yes understand. one last question. For the class probability to indicate class membership - Meaning > 0.5 means class 1, if not it is class 0. So, the coefficients should all sum up to to indicate class membership? Meaning, if I add all my positive coefficients, lets say it comes up to be 0.3. if I add all my negative coefficients, it all comes up to be 0.4...But intercept is 0.25...So, the final number will be 0.25+0.3 - 0.4 = 0.15 – The Great Feb 16 '22 at 09:37
Does this mean 0.15 as remaining coefficient value indicate negative class membership (class 0)? – The Great Feb 16 '22 at 09:38
@TheGreat I don't know what is the statistical model that you are referring to, so cannot answer that. – Tim Feb 16 '22 at 09:39
I mean the logistic regression model. linear model – The Great Feb 16 '22 at 09:40
@TheGreat in logistic regression the parameters do not sum to 1. – Tim Feb 16 '22 at 09:41
Is there any tutorial or resource that you can share which can help me understand how the final class membership is found out? – The Great Feb 16 '22 at 09:42
If you have time and interested, you can share your views on this post (I have put a bounty few days back) - https://datascience.stackexchange.com/questions/107928/usefulness-of-intercept-in-layman-terms-eli5 – The Great Feb 16 '22 at 09:43

Understanding role of intercept in prediction

1 Answers1