3

When you different models that are trying to establish the relationship between the same variables -- but these models have different structural forms, when can you compare the $R^2 $ of these models?

For example, of the three models here:

  1. Regressing $ Y $ on $ X $
  2. Regressing $ Y $ on $ \ln(x) $
  3. Regressing $ \ln Y $ on $ \ln(x) $
  4. Regressing $ \ln(Y) $ on $ X $

which models would be comparable?

Should I write out: $$ R^2 = \frac { \Sigma \hat{y_i}^2 }{\Sigma y_i^2} $$ and then see if the numerator and denominator have the same general structure? (For example, if the denominator of one model is in logs and the other in raw form, then the model with the raw form variables will have a higher "Total Sum of Squares", and thus lesser $ R^2 $

(I do not necessarily need direct answers on these; but a general direction will greatly help).

WorldGov
  • 705
  • 7
  • 14
  • You added the [tag:self-study] tag. Are you doing this as a textbook exercise? If so, can you explain what you have been thinking so far, and where you got stuck? Looking at the [tag:r-squared] tag I added may be helpful. – Stephan Kolassa Jan 30 '20 at 08:46
  • @Stephan - This wasn't a textbook exercise, but this was in a tutorial set given by my professor. I've added my guess at the answer. – WorldGov Jan 30 '20 at 08:58
  • 1
    Here is a hint: interpret $R^2$ as the proportion of variance explained by a model. What does this mean for whether we can compare your options? (Double-check the formula you give, it's somewhat off. See https://en.wikipedia.org/wiki/Coefficient_of_determination) – Stephan Kolassa Jan 30 '20 at 09:52
  • Can we say that to compare different models, the TSS should remain unaffected, or if it is affected, the numerator (ESS) should be similarly affected such that the two cancel out? (I've edited the formula). – WorldGov Jan 30 '20 at 11:06
  • In the simplest cases, and in more complicated cases if you arrange this, $R^2$ is the square of the correlation between observed and predicted and will fall within $[0,1]$. So, what you have washed out is all information on level, spread, what the variables are, how they are defined, and what makes sense or is surprising, and what is better statistically. There are fields in which $R^2 \sim 0.1$ can be interesting and $R^2 \sim 0.9$ is a sign of faking or asking a silly question, and fields in which $R^2 \sim 0.9$ is a sign of moderate competence and $R^2 \sim 0.1$ is the opposite. – Nick Cox Jan 30 '20 at 11:13
  • 1
    On the whole. comparisons are sometimes helpful if although everything is the same except some small details, e.g. that the response is the same and you are just comparing different predictors and/or different versions of the predictors (transformed, untransformed, whatever). – Nick Cox Jan 30 '20 at 11:24
  • 3
    Don't think in abbreviations ("TSS"). Think in interpretations. Does it make sense to compare proportions of variance explained for $Y$ (by $x$) and for $Y$ (by $\ln x$)? Does it make sense to compare proportions of variance explained for $Y$ and for $\ln Y$? – Stephan Kolassa Jan 30 '20 at 16:15

0 Answers0