Does it make sense to partially scale the data matrix X in regression?

Question

For some reason my supervisor wants me to centre only the independent variables that are used in interaction terms. I have never heard of such a practice. Does it make sense to partially centre data matrix $X$, that is, centering only some of the columns of data matrix $X$, not all of them?

Thank you.

score 2 · Answer 1 · edited Apr 13 '17 at 12:44

Centering in regression is pretty common, and doesn't have to be performed on all independent variables. Centering primarily affects the interpretation of unstandardized coefficients. Here's a quote from Wikipedia:

By centering or standardizing the independent variables, the coefficient of X or Z can be interpreted as the effect of that variable on Y at the mean level of the other independent variable.$^{[7]}$

@FrankHarrell brought up some other issues in centering for interactions that you might want to consider though.

$^{[7]}=$ Dawson, J. F. (2013). Moderation in management research: What, why, when and how. Journal of Business and Psychology. DOI: 10.1007/s10869-013-9308-7.

score 1 · Answer 2 · edited Jan 06 '14 at 07:55

I wanted to share another reference. It's a "comment" paper, and actually part of an argument between two economists, Smith and Campbell, and a statistician, Marquardt. But it's relevant.

Link to the Marquardt JASA article. Sorry to refer you to a non open-access article, but I highly recommend the read if you get a chance.

Marquardt is strongly arguing in favor of centering and scaling for various reasons, one being that you have prior knowledge about the absolute size of these variables (they shouldn't be much bigger than 3, based on reasoning using Chebychev's inequality), and this is a justification for "shrinkage" in techniques like ridge regression.

And of course there's the interpretation issue as well. He draws a cool picture of a parabola fit to data. Almost all the data is on the right side of the parabola, so that it's almost linear in a positive direction. But the coefficient on the linear term is negative! (Remember there's a squared term in there.)

Now a parabola $ax^2 + bx + c$ is really just like having a term interact with itself. That same phenomenon would also happen with interactions, where the main effect would have a nonsense value because you never see the interacting value at zero. Of course, some people you shouldn't try to estimate main effects when there are interactions.

To sum it up, terms with interactions and polynomial terms are where you'll get the most out of your effort. If you're going to apply shrinkage or a Bayesian prior that says parameter estimates should be small, maybe you should standardize all your numeric variables.

Does it make sense to partially scale the data matrix X in regression?

2 Answers2