centering skewed predictors

Question

I would appreciate assistance with the following: I am running a gamma-GLM model. As part of it, there is a two-way interaction between a categorical (2 levels) and an interval/continuous predictor. The continuous predictor is positively skewed, and non-normal. In my fiend (psychology) we mean-center predictors before computing the interaction. Here is the question: because of skeweness, mean is not a measure of central tendency for the proposed skewed predictor. 1) if I still mean-center it ignoring non-nirmality, would it affect my results (standard errors, CI's)? 2) what other way of centering would you suggest? I read about median-centering as a way to get less biased estimates. Would it make my model better, as median is also not a measure of central tendency (the distribution is positively skewed). Or I can pick a value (say mode) and do mode-centering?

Thank you!

Welcome to CV. Unfortunately, the literature on centering predictors used in interactions has a high degree of controversy and disagreement. The topic also has a long history on this site. By searching for keywords such as "centering interaction predictors," etc., many threads related to your question will appear. Here is a link to one of them ... http://stats.stackexchange.com/questions/65898/why-could-centering-independent-variables-change-the-main-effects-with-moderatio You will see the diversity of opinions reflected there. How you choose to deal with it is your choice. — Mike Hunter, Jun 03 '16 at 19:36
Hopefully the arguments people offer for centering predictors don't relate to whether or not they're normal. — Glen_b, Jun 04 '16 at 06:52

Jarle Tufto · Answer 1 · 2016-06-03T21:07:17.993

Centering a numerical predictor around some value just amounts to reparameterization of the model. The model \begin{equation} y = a + b x + e \end{equation} is equivalent to the model \begin{equation} y = a' + b' (x-x_0) + e \end{equation} The relationship between the two parameterizations can be seen by rewriting the reparameterized model on the same form as the original model: \begin{equation} y = \underbrace{a'-b'x_0}_a + \underbrace{b'}_b x + e. \end{equation} From this we see that $a = a'-b'x_0$ so intercepts in the two models are different but since $b=b'$, the slopes are equal. So any kind of centering around some value $x_0$ does not influence the interpretation of the slope nor estimates of its standard error, confidence intervals or associated hypothesis tests.

Similar things holds true if your model includes an interaction between $x$ and some factor $i$. Then \begin{equation} y = a + b x + c_i + d_i x + e \end{equation} is equivalent to the $x_0$-centered model \begin{equation} y = a' + b' (x-x_0) + c'_i + d'_i (x - x_0) + e. \end{equation} which can be rewritten as \begin{equation} y = \underbrace{a' - b' x_0}_{a} + \underbrace{b'}_b x + \underbrace{c'_i - d'_i x_0}_{c_i} + \underbrace{d'_i}_{d_i} x + e. \end{equation} From this we see that $d_i = d'_i$ so the null hypothesis $d_1=d_2=\dots=d_k$ (no interaction in the non-centered model) is equivalent to null hypothesis $d'_1=d'_2=\dots=d'_k$ (no interaction in the $x_0$-centered model). So in terms of significance of the interaction term, mean- and median-centering is of no consequence.

Estimates of main effects of the factor $i$, $c'_i$ vs $c_i=c'_i-d'_i x_0$, will be different though and hence their significance will be different too, but you usually test for significance of the main effects before significance of interactions. And without an interaction present, the significance of a factor $i$ is not influenced by any kind of centering of $x$.

Only if you believe that a reasonable null hypothesis is that $i$ has no effect on average (for an average value of $x$) given that the interaction is present, do you need to think more carefully about this.

Thank you so much Jarle. Please let me know if I understand you correctly. It doesn't matter what value to use for centering- it won't affect coefficients and SE's in the model with interaction. Therefore, if predictor is not normally distributed (residuals are not normal), I can choose mean-centering or other value (median or mode)- centering, but all 3 models (mean, median, mode)- centered will arrive to same estimates, SE's and significance for interaction. — Natalie, Jun 06 '16 at 14:39

centering skewed predictors

1 Answers1