1

When we study about normalization, various facts are given to explain the necessity.

The most important of all is that:

Normalized column if in higher range than others can have more impact on output and make our results biased.

Simple example: a model which uses features like person age and salary- Age can have low effect on output as its very small and salary might impact more.

But my question is that model should be smart enough to calculate theta as per the range. Age will have higher theta whereas salary will have small. And thus the model will not be biased towards salary.

Another reason they say that in ML, normalization helps algo in converging faster. Similar thing of varied range theta can also be applicable here and hence our algorithm will converge on the same speed.

I need some help in understanding these concepts of normalizing X or coming up with theta with values as per range of X.

Onki
  • 225
  • 1
  • 7
  • Possible duplicate of [When conducting multiple regression, when should you center your predictor variables & when should you standardize them?](https://stats.stackexchange.com/questions/29781/when-conducting-multiple-regression-when-should-you-center-your-predictor-varia) – kjetil b halvorsen Jul 07 '19 at 10:05

1 Answers1

1

One example of the quote you give is the use of standardization in PCA (and thus also PCR): Variables measured on a different scale will by definition have a higher variance, causing PCA to attribute greater importance to such variables. In this case, whether you standardize or not says something about whether variables with greater variance are indeed more 'important', or simply measured on a different scale. A similar argument can be made for normalization in other methods.

Another reason is numerical stability. Very small numbers may be hindered in their precision by the machine precision. Very high numbers may hinder the speed of computation.

As to why algorithms can't take this into consideration by themselves: How else would an algorithm do so other than by normalizing or standardizing?

Frans Rodenburg
  • 10,376
  • 2
  • 25
  • 58
  • I am still not clear on this concept. Sorry could you please explain it more like you explain it to dummies.. – Onki May 08 '19 at 04:41
  • No need to apologize, but could you be more specific which part you do not understand? – Frans Rodenburg May 08 '19 at 04:44
  • Could you give me more intution on how greater variance may make model to give more importance to these variables. and How high numbers may hinder the speed of computation. – Onki May 08 '19 at 04:57
  • if we have respective weight theta then the result would be manged irrespective of the high variance – Onki May 08 '19 at 04:58
  • Concerning weights, that is true in regression type models, but it might not hold for clustering, multidimensional scaling or other methods. Furthermore, even in regression analysis, those weights must be estimated, or updated through a gradient, either of which will be more precise further away from the machine precision. If your variable has extremely large values, then your weight will be extremely small in the situation you describe, which again puts us closer to the machine precision. (...) – Frans Rodenburg May 08 '19 at 05:05
  • 1
    (...) What I meant by speed of computation is that if there are extremely large numbers, then storing them in the memory/multiplying them by other numbers/etc. will also take longer. – Frans Rodenburg May 08 '19 at 05:06
  • so you in one way adjusting weights can be a solution but the problem is weights can get extremely small which can lead towards machine precision problem. this is what I understood. I am keeping linear regression in mind. But as you say that actual importance will be realized in other models. I will wait for other models to study n then will recheck my understanding. Thanks @Frans – Onki May 09 '19 at 06:10
  • No problem, good luck! You can accept this answer if it answers your question. – Frans Rodenburg May 09 '19 at 07:32