2

It has been extensively described in this website the reason why one cannot properly calculate the $R^2$ - neither the Adjusted $R^2$ - in regression models fitted without an intercept. What is a good alternative metrics to compare the goodness-of-fit between regression models with and without intercept? Is doing the Pearson's correlation coefficient between dependent variable and fitted values $r^2$ a good solution?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
ouranos
  • 467
  • 3
  • 15
  • 1
    There are an infinity of potential metrics. What do you want it to help you decide? Every measure of goodness has the job of articulating good, and separating it from the bad. What would a great measure do? Usually there is a business decision behind the numbers, so make sure to think about that. If I were to give a potentially ideal metric, how would you know it was doing its job perfectly? – EngrStudent Jul 02 '20 at 11:59
  • 1
    Reminder that R^2 always increases with the addition of new variables, even if they're unrelated to the dependent var! I always use AIC/BIC - which both penalize for the addition of relatively useless variables, BIC more so. – Alex Jul 02 '20 at 13:03
  • Do you mean *without* an intercept, or *with* an intercept *constrained* to some value? – Alexis Feb 22 '21 at 15:56
  • 1
    I meant with intercept fixed to zero. – ouranos Feb 23 '21 at 07:11

2 Answers2

2

The reason that $R^2$ is dubious for a model without an intercept is because it compares the model fit (sum of squares errors) to the fit for a model with just an intercept.

However, by expressing interest in $R^2$, you are saying that the sum of squared errors interests you. So compare the sum of squared errors!

What you lost when you do this is the ability to gauge if a model is doing a decent job of predicting. It sounds good to get $R^2=0.95$; the model explains most of the variability. If your sum of squared errors is $17$ that’s fairly meaningless. However, that model has a better fit than a model with a sum of squared errors of $37$.

You can compare the sum of squared errors for any two models on the same data set. It sounds like you’re doing linear models, but it would be perfectly valid to compare to the sum of squared errors of a random forest model (for instance).

Out-of-sample testing might be a topic that interests you for this work.

And as EngrStudent wrote, there could be many other viable metrics, depending on what you value. Sum of squared errors seems to be the default in the absence of an argument for another metric, however.

Dave
  • 28,473
  • 4
  • 52
  • 104
  • That's also a good idea. Apparently combining AIC (penalizing the number of covariates) and adding the RMSE to the metrics seems like a good combination. In addition, I can always do correlation between predicted and dependent variables, and calculate the proportion of explained variance by hand. – ouranos Jul 02 '20 at 15:44
  • Proportion of what variance? If you compare the variance explained by the no-intercept model to the variance of the pooled response variable, that's $R^2$. – Dave Jul 02 '20 at 16:05
  • The idea is that the $R^2$ gives you the proportion of the variance of the dependent variable explained by the model. [But this does not hold if your model has no intercept](https://stats.stackexchange.com/questions/26176/removal-of-statistically-significant-intercept-term-increases-r2-in-linear-mo). So I would manually calculate $var(fitted)/var(y)$ instead (combined with AIC and MRSE as suggested), – ouranos Jul 02 '20 at 16:15
  • $var(y)$ is the variance from the intercept-only model, meaning that $var(fitted)/var(y)$ is $R^2$. – Dave Jul 02 '20 at 16:19
  • Yes, but only if the model is fitted with intercept. I mean, if you let the intercept be different than zero. – ouranos Jul 02 '20 at 16:21
  • Only what if the model is fitted with an intercept? $var(y)$ is $var(y)$ whether you fit a regression or not. – Dave Jul 02 '20 at 16:25
  • Yes, but the ratio of variances from model and $y$ being equivalent to $R^2$ only holds when you let the model freely fit the intercept. – ouranos Jul 02 '20 at 16:28
1

If your goal is to compare two models (one with an intercept, and one without), and you're concerned that the in-sample $R^2$ will be misleading because one model has more parameters than the other, two options come to mind:

  1. See which model produces better out-of-sample predictions by running cross-validation. The simplest version of this will be splitting your dataset in two, estimating both models on one half of the data and making predictions on the other half with the fitted models, then comparing the mean squared errors. A common, more efficient variant of this is K-fold cross validation.

  2. Compute an in-sample measure of goodness-of-fit which penalizes the model with more parameters, like the AIC.

Louis Cialdella
  • 1,404
  • 12
  • 17
  • These are two valuable and pertinent comments, I am actually already doing cross validation (leave one out, my dataset is rather short). AIC is a very good candidate. However, I am pointing more into [this direction](https://stackoverflow.com/questions/20333600/why-does-summary-overestimate-the-r-squared-with-a-no-intercept-model-formula?noredirect=1&lq=1). And not so much worried about the number of covariates I have. – ouranos Jul 02 '20 at 15:42