4

It seems odd to scale a categorical variable, but I need to get the correct coefficients for each of my variables in linear regression. Is it correct to scale the same way you would with continuous variables, or what is the right thing to do here?

For example if x is categorical and y is continuous:

model=lm(DV ~ scale(x) + scale(y), data=myData)

Is the above the right thing to do?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Brandy Riedel
  • 51
  • 1
  • 5
  • 4
    What do you mean by "the correct coefficients"? Correct with respect to what standard? – Glen_b Sep 24 '14 at 10:12
  • 1
    It's not correct to scale a factor variable for regression like that. – Roland Sep 24 '14 at 11:10
  • From my understanding, if you don't scale, than the betas/coefficient values are not meaningful. It seems weird to only scale some variables (e.g. continuous variables) if you have both in an equation though. – Brandy Riedel Sep 24 '14 at 19:47
  • 5
    I don't understand your understanding. If the scales of your variables are very different it is recommended to normalize them. Otherwise it's not necessary. However, a factor is dummy-encoded automatically by the `lm` function, i.e., each level (minus the reference level) is encoded as one dummy of 0/1 values. There is no need for scaling since all dummies are on the same scale. Using `scale` for a `factor` variable should throw an error. – Roland Sep 25 '14 at 09:28
  • I was saying in my example that some predictors are categorical, but also some are continuous, so they would not all be on the same scale without some transformation. – Brandy Riedel Sep 25 '14 at 18:12
  • You really need to put some context. Maybe, is this for lasso or other regularization, or bayes? For linear models estimated without regularization no form of normalization is needed. See https://stats.stackexchange.com/questions/69568/whether-to-rescale-indicator-binary-dummy-predictors-for-lasso which might be a duplicate – kjetil b halvorsen May 21 '17 at 10:39
  • Possible duplicate of [whether to rescale indicator / binary / dummy predictors for LASSO](https://stats.stackexchange.com/questions/69568/whether-to-rescale-indicator-binary-dummy-predictors-for-lasso) – kjetil b halvorsen May 21 '17 at 10:42

1 Answers1

2

In a comment you write:

From my understanding, if you don't scale, than the betas/coefficient values are not meaningful.

this is not correct. They have meaning, it's just a different meaning. If you use the original units, then the coefficients are about the original units. Often, this is what yo want. If you scale, then the coefficients are about the scaled units - often standard deviation - sometimes this is what you want. Opinions differ as to how often to scale, how good it is to scale and so on.

Scaling a categorical variable doesn't really make much sense. It's not even clear what it would be. Categorical variables have to be parameterized, often by dummy coding (although other schemes are possible).

Peter Flom
  • 94,055
  • 35
  • 143
  • 276