So, my question is in a prediction setting, how can activation fee alone (when X=0) help us predict whether a member will churn or not? I understand it is a constant. But can activation fee alone help in prediction? Isn't it useless?
I agree that constant activation fee seems as a quite useless feature for predicting churn. In the example, the fee was used to predict total membership fee, if you ignored it, you wouldn't be able to predict it correctly. You can try yourself: fit linear regression without intercept to the data in the example; the results would be off.
You may want to read the When is it ok to remove the intercept in a linear regression model? thread that discusses problems with linear regression models when the intercept is removed.
So answering your general question, the intercept helps to correct the model for the "base rate" and make the predictions more accurate.
- When predicting the total membership fee, the base rate would be the activation fee.
- When predicting churn, it would be the base churn rate that does not depend on other variables (churn rate when nothing happens).
- When predicting lung cancer using "number of cigarettes smoked per day" feature, the intercept would be the rate of lung cancer in the general population, while the slope would tell you how does it change with the change in the features.
In all those cases, failing to correct for the base rate would give you predictions that are off.
However please keep in mind that the intercept is tightly coupled with other variables, so it is not "just" the global average, but the base rate corrected for other features included in your model.