When we study about normalization, various facts are given to explain the necessity.
The most important of all is that:
Normalized column if in higher range than others can have more impact on output and make our results biased.
Simple example: a model which uses features like person age and salary- Age can have low effect on output as its very small and salary might impact more.
But my question is that model should be smart enough to calculate theta as per the range. Age will have higher theta whereas salary will have small. And thus the model will not be biased towards salary.
Another reason they say that in ML, normalization helps algo in converging faster. Similar thing of varied range theta can also be applicable here and hence our algorithm will converge on the same speed.
I need some help in understanding these concepts of normalizing X or coming up with theta with values as per range of X.