-1

I understand that we need to normalize data for classification problems because otherwise the variable with the larger scale will dominate the result. But why don't we normalize for linear regression?

Could someone please provide an intuitive explanation or an example to explain the difference between classification algorithms (like logistic regression) and linear regression?

Thanks!

bugsyb
  • 491
  • 1
  • 5
  • 13
  • Your premise is flawed... Where did you hear that you must normalize variables in classification problems? There are many reasons to normalize, none of them are "is a classification problem"/ – Matthew Drury Aug 07 '17 at 03:44
  • Okay I might be mistaken, but we rarely normalize variables in regression problems, is that correct? – bugsyb Aug 07 '17 at 03:47
  • 7
    It doesn't really have anything to do with whether the problem is regression or classification. So you normalize in regression in the same circumstances as classificaton: you want the independent variables to occupy the same scale. Two give two common examples: this is important for distance based models (KNN, clustering), and when using regularization (regression or classification). – Matthew Drury Aug 07 '17 at 03:50
  • @MatthewDrury: do you want to post your comment as an answer? – Stephan Kolassa Aug 07 '17 at 06:38
  • 1
    Ordinary least squares regression is unaffected by scaling of the variables, so normalizing is usually not done as it is unnecessary effort. But if scaling affects the results (whether classification or other forms of regression) then you need to consider whether normalizing is desirable or not – Henry Aug 07 '17 at 07:13
  • @StephanKolassa Sure! – Matthew Drury Aug 07 '17 at 13:36
  • Look at my answer here: https://stats.stackexchange.com/questions/244507/what-algorithms-need-feature-scaling-beside-from-svm/252625#252625 – kjetil b halvorsen Aug 07 '17 at 13:51
  • @Henry Although in theory OLS is unaffected by scaling, in practice the computation of OLS solutions can be heavily affected by scaling. That's why good OLS software automatically standardizes variables (or does something equivalent) internally, whether the user knows it or not. – whuber Aug 07 '17 at 13:52

1 Answers1

2

It doesn't really have anything to do with whether the problem is regression or classification. So you normalize in regression in the same circumstances as classification: you want the independent variables to occupy the same scale. Two give two common examples: this is important for distance based models (KNN, clustering), and when using regularization (regression or classification).

Matthew Drury
  • 33,314
  • 2
  • 101
  • 132