Deciding which covarates should be log-transformed, before model is done?

Question

I understand that a good way to see if the covariate should be transformed is to do a linear model and plot the residuals. if there is a pattern, a log-transformation may be needed. but.. I am right now trying to find the best model for a relationship with several possible predictor variables. So I don't know how to test in this case. I was able to do it in SLR, but in MLR I am not sure how to do it. Should I just add1, test it, test it in log-scale and go on? Actually I was going for the backward method; is there a way to do it with that?

I do know it is not that easy, you can't just plot the residuals and decide. this is just to understand how it would be done,so I can figure out the rest on my own (I guess). Has to be interpreted to do this anyway.

I'm not sure I understand.... You run the model you are interested in (SLR, MLR or whatever). Then you can plot residuals. You can also plot partial residuals and all sorts of things. The default plot for `lm` in `R` is a good set of graphs, as is `(plots = residuals diagnostics)` in `SAS`. Backward selection, though, is terrible (as is forward and stepwise). — Peter Flom, Oct 16 '12 at 17:14
how am I supposed to do it otherwise, if not stepwise, forward and backward? — lisa, Oct 16 '12 at 17:16
The best way is to use substantive knowledge and ignore statistical significance during model fitting. If you must use some automatic method, LASSO or LAR are good. See my paper [stopping stepwise](http://www.nesug.org/proceedings/nesug07/sa/sa07.pdf) for more; or see @FrankHarrell 's book "Regression Modelling Strategies". — Peter Flom, Oct 16 '12 at 17:18
Thanks, I will look into that! Still..Is there is possibility to decide which ones to use on log-scale BEFORE I've got the model? — lisa, Oct 16 '12 at 17:26
Often, log scale is better because it makes more sense (regardless of assumptions). E.g., with money, log scale often makes sense because doubling one's income is more consistent than adding an amount. That is, if you make \$20,000 a year, a \$5,000 raise is huge. If you make \$200,000 a year, a \$5,000 raise is small; and, if you make \$2,000,000 well.... hire someone to do your data analysis for you! :-). — Peter Flom, Oct 16 '12 at 17:31
Yes, you can make intelligent decisions about re-expressions before determining a model. For a worked example that shows the philosophy and one of the possible techniques, you might look at http://stats.stackexchange.com/a/35717. The driving motivations for nonlinear re-expressions of the variables are to linearize relationships, symmetrize the residual distributions, and make the residual variances more homogeneous. All three can be diagnosed using graphical and EDA techniques. The EDA techniques provide automatic ways to estimate the re-expressions. — whuber, Oct 16 '12 at 17:32

Deciding which covarates should be log-transformed, before model is done?

0 Answers0