Standardize non-normal predictors before performing binomial GLMM using mean and sd?

Question

I am planning to predict a binomial variable (1/0, a used point by an animal or point available to an animal in its range) using several continuous, distance-based predictor variables (distance to habitat type) and interactions (whether that habitat was available to an animal in its range). The data for each predictor variable are non-normal. I will use SAS to perform a binomial GLMM with a logit link.

Several sources (e.g., Gelman and Hill 2007) suggest that standardizing predictors by subtracting the mean and dividing by two standard deviations prior to including them in the model can help interpret parameter estimates, especially when interactions are present (but here, Variables are often adjusted (e.g. standardised) before making a model - when is this a good idea, and when is it a bad one?, @bluepole disagrees).

I read these two posts: Standardizing non-normal data for use in distance-based classifier and Standardizing data, but I didn't find a satisfactory answer to my question: Is it appropriate or acceptable to "standardize" non-normal data using the mean, when we all know that non-normal data aren't well represented by that mean? If it isn't appropriate, what are my options? Could I use the median and quantiles in a similar way? Or would I attempt to (this is getting messy) transform the data to a normal distribution and then try to standardize (yuck).

Mirkin's book (Clustering for Data Mining: A Data Recovery Approach, 2005) suggests that means and standard deviations may not be appropriate for scaling...

enter image description here

Keep in mind that linear transformations of the predictors in a linear model don't change the inferences; they _only_ change the interpretation. — shadowtalker, Jan 06 '15 at 22:22

score 3 · Accepted Answer · answered Jan 06 '15 at 20:22

3

The predictor variables do not need to be transformed or standardized. The standardization can sometimes help with interpretation or comparing effect size between predictors, but is not needed.

If you choose to standardize then standardize based on what will help you the most in your interpretation. If you subtract the mean of each predictor variable from that variable, then the intercept will represent the predicted value when all the variables are at their means. If you subtract the median from each predictor then the overall intercept will represent the predicted value when all the predictors are at their medians. If you don't subtract anything then the intercept represents the prediction when all the predictors are at a value of 0 (which may be interesting, or it may be meaningless, depends on the variables being used and the range of values observed).

Without dividing the predictors by anything, the coefficient represents the change in the log-odds (exponentiate for the odds ratio) for a 1 unit increase in the predictor while holding everything else constant. The only way to hold an interaction term constant while increasing one of the variables included in the interaction is when the other variable is fixed at 0, so this is part of the justification for centering/scaling/normalizing, you can think of the main effect slope in terms of a 1 unit increase while holding everything else at the mean/median/whatever. In general it is difficult to really understand the effects of individual variables by their slopes when there are interactions or transformations, better is to make predictions at meaningful values of the predictor variables and create a table and/or graph to show how the predictions change with the predictors. Scaling a predictor variable is most meaningful when the units that the variable are measured in are different from a meaningful differences, for example if you measure time in days but the observed times range over 10 years then looking at the effect of increasing time by one day is probably less meaningful than looking at the effect of changing time by 1 year (or month, or decade, or ...), so scaling by something like 365 or 365.24 will be more meaningful. Scaling by a standard deviation will put you in a reasonable scale, but it may be more difficult to interpret than a fixed value. Dividing by something like the inter quartile range will have a similar effect (and may be easier or harder to interpret/explain). If you scale I would suggest scaling by something with some scientific meaning rather than something derived solely from the data.

Transforming a predictor to normal without any other justification is just going to complicate any interpretation without gaining anything useful.

answered Jan 06 '15 at 20:22

Greg Snow

46,563
2
90
159

Does it change your response if I say I am not interested in intercepts but in random slopes? I'd like to know the relative "importance" of each predictor variable in comparison to each other. – Nova Jan 06 '15 at 20:26
@Nova, it does not really change the response since the response shows why the advice is in the literature. When comparing relative importance the standardization by the SD has more justification, but with skewness the effect of the different SD's could be questionable. The nice thing about the normalized predictors is that they become unitless, changing a predictor from days to years will also change the slope (but not its significance), the slope on the normalized value will be more constant and comparable (but choosing a common scale with scientific meaning will be better if available). – Greg Snow Jan 06 '15 at 20:36
I understand, but I'm still not sure where to take your advice... it seems like (let me know if I'm wrong) subtracting the median and dividing by the interquartile range would be a possible solution, but I've never seen this before. – Nova Jan 06 '15 at 20:44
@GregSnow this is a great answer but the second paragraph is huge and hard to read. I didn't want to stick my nose in with an edit but I would highly recommend breaking it up. – shadowtalker Jan 06 '15 at 22:26
Also, +1 for "make predictions at meaningful values of the predictor variables and create a table and/or graph to show how the predictions change with the predictors." Nothing illuminates a model like putting it to work. – shadowtalker Jan 06 '15 at 22:28
1

Minor exceptions/caveats: (1) if you have a very difficult model to fit, centering and/or scaling may help with *numerical* stability; (2) if you force the correlations between random slopes and intercepts to be zero, then centering continuous predictors actually changes the model. – Ben Bolker Jan 06 '15 at 23:11
@GregSnow, all of my predictor variables are "distance to habitat feature"*"Availability of that feature" including things like "forest type" or "road". I am definitely graphing the results to show how the density of a certain feature (which vary widely throughout animal ranges) influences the random slopes for each animal (and it makes sense, thank goodness - when standardized by mean/2*sd, haven't tried not standardized yet). My main goal is to understand the selection of animals to each habitat feature, so I want the features to be comparable to each other. – Nova Jan 07 '15 at 03:30
1

@Nova, It looks like your features are already comparable. Standardization probably will not help much (will not really hurt either). The real advantage of standardization is when you have different variables on very different scales that you want to compare on similar scales. – Greg Snow Jan 07 '15 at 18:17
Thanks for your help here Greg. Because the magnitude of the measurements are quite different, standardizing does make the regression coefficients much easier to interpret (when I tried without standardizing, regression coefficients are very small). Unless you or others, incl. @BenBolker, think it's a bad idea, I may go with standardizing by subtracting the median instead of the mean, and dividing by 1.5 the interquartile range (equivalent to 2*standard deviations for a normal distribution). This seems comparable to what is done in the literature but seems more appropriate for my data. – Nova Jan 07 '15 at 19:47
1

I have no problem with that. You might want to write up your decision, and justification, as an answer (comments are ephemeral). – Ben Bolker Jan 07 '15 at 20:08

Nova · Answer 2 · 2015-02-12T22:29:59.063

I am going to sum up the comments that helped me make a final decision and post what I've decided to do in the end. I want to be able to compare my regression coefficients to each other and want to ensure that regression coefficients are sensible (without standardizing, they are very small). My predictor variables are of the same measurement unit (kilometers) but differ in scale by orders of magnitude. Also, Ben Bolker pointed out that standardizing helps with numerical stability if the model is difficult to fit. Greg Snow noted that the slopes on normalized values will be more constant and comparable when data are standardized. This is very important in my analysis.

I wanted to come as close to standardizing by subtracting the mean from each value and dividing by 2 standard deviations, as suggested in Gelman and Hill 2007. However as Greg Snow pointed out in comments, this gets a bit questionable when input data have some skewness (which my data do).

In the end I've decided to standardize by subtracting the median from each value and dividing by 1.5 * the interquartile range (IQR). This is approximately equal to standardizing as above if the distributions are normal, but is likely more appropriate for data that are not normal. (See http://www.physics.csbsju.edu/stats/box2.html: IQR = 1.35*standard deviation).

Standardize non-normal predictors before performing binomial GLMM using mean and sd?

2 Answers2