0

I have 101 independent variables in a logistic regression predictive model. 55 variables are continuous and 46 are categorical one-hot encoded. 5 of the 55 continuous variables are expressed as percents (values ranging from 0.05 to 0.90).

For the 50 continuous variables, I am experimenting with applying normalization (MinMaxScaler) or standardization (StandardScaler). I am unsure what to do with the 5 percent variables.

Do I need to apply normalization, standardization, or leave the percent variables as they are without scaling them?

Insu Q
  • 255
  • 2
  • 9
  • 2
    Could you make it more specific? The answer depends on many different factors. – Tim May 13 '20 at 15:04
  • @Tim can you give me an example where it wouldn’t be needed and where scaling would be needed (and which scaling method to use)? – Insu Q May 13 '20 at 15:08
  • 1
    Please make it more specific. – Tim May 13 '20 at 15:10
  • 1
    @Tim Updated. Let me know if more detail is needed. Not sure what else I can add since the answer will be “it depends” anyway. I’m interested in learning about the scenarios when scaling is or isn’t appropriate. – Insu Q May 13 '20 at 15:17
  • See https://stats.stackexchange.com/questions/29781/when-conducting-multiple-regression-when-should-you-center-your-predictor-varia – kjetil b halvorsen May 14 '20 at 12:59
  • @kjetilbhalvorsen thanks for sharing, but that post doesn’t answer my question. – Insu Q May 14 '20 at 13:22
  • Short answer is there is nothing you have said that indicates a need for scaling, that is a very seldom need. Unless you can tell us some more detail indicating a need, the default should be to let the variables as is. – kjetil b halvorsen May 14 '20 at 13:34
  • @kjetilbhalvorsen I’ll accept that as the answer if you post it. I tried scaled and unscaled percentages and performance was roughly the same, so I came to the conclusion that scaling wasn’t needed, at least for my problem. – Insu Q May 14 '20 at 13:45

0 Answers0