When transforming the dependent variables, I know R^2 and related criterion is not suitable for model selection. Then which one should I use?
Asked
Active
Viewed 60 times
0
-
1the question is not clear... model selection for what? – ABK Mar 06 '20 at 10:49
-
In a simple linear regression, I want to select the best lambda when doing box-cox transformation for the dependent variable . Thank you. – Eva Mar 06 '20 at 10:58
-
See https://stats.stackexchange.com/questions/242526/multiple-linear-regression-residual-normality-and-transformations or the [list](https://stats.stackexchange.com/search?q=transform+depen*+variab*+answers%3A1+box-cox). Probably a dup in there – kjetil b halvorsen Mar 06 '20 at 13:14
-
This question is so general that it deserves a general answer, which is provided by the duplicate. – whuber Mar 09 '20 at 14:31
1 Answers
0
Unless you are exclusively interested in prediction (and not at all in explanation) then I think you should only transform variables for substantive reasons, not statistical ones.
EDIT 2 For instance, it often makes sense to take logs of money variables such as income or price.
EDIT 1 Note that not all statisticians share my views.
If the assumptions of OLS regression are not met, then, rather than transform, you can use a different model, such as quantile regression or robust regression.

Peter Flom
- 94,055
- 35
- 143
- 276
-
Thank you for your suggestion. But I have to do some transformation for the response and compare them, which the requirement of my hw. – Eva Mar 09 '20 at 13:58
-
1It is worth noting that your opinion very well may be a minority one among statisticians, especially those exposed to Tukey's work on EDA. Tukey advocated using exploratory analysis to uncover effective ways of re-expressing data. This was one of the cornerstones of his approach to all data analysis. – whuber Mar 09 '20 at 14:33
-
OK, I edited my answer. But Tukey didn't have the tools that we have - at least, not in any practical way. I know quantile regression is very old in theory, but it only became practical with computer. – Peter Flom Mar 09 '20 at 18:59
-
There are many situations e.g fitting power functions or exponentials where taking logs first is natural and convenient and often the best thing to do. I don't know whether you call that circumstance substantive or statistical. The same applies, although on the whole with less force, to several other transformations. – Nick Cox Mar 09 '20 at 19:09
-
@NickCox If it's "natural" then I think that implies substantive. E.g. taking logs of income or price variables. If your theory suggests some other relationship, that's substantive too. – Peter Flom Mar 09 '20 at 19:16
-
@Eva If this is HW then you need to add the self-study tag and show us what you have tried so far. – Peter Flom Mar 09 '20 at 19:18
-
2With many kinds of data being willing to use logarithmic scale arises from past experience that it is a good idea. Some people call that theory. – Nick Cox Mar 09 '20 at 19:20
-
Thank you for all of the suggestions above. I am sorry for not asking a good questions here. And actually, I need a statistical criterion(some numeric methods like R^2, MSE and so on) to measure how to select a lambda among 0 and 1 when conducting power transformation for the response. (y^\lambda, and the response y is salary.). I learned before that R^2 does not work in such a situation. And MSE also not suitable because the scale is different for different \lambda. So I am confused which one I can use in such a situation. Thank you for help! – Eva Mar 10 '20 at 01:28