When using regularization wouldnt it make all parameters very small?

Asked Apr 03 '21 at 07:15

Active Apr 03 '21 at 07:15

Viewed 16 times

In regularization, we add square of thetas multiplied by lambda(excluding theta_0). The value of lambda is high because values of theta should be close to zero to neglect the value of its associated feature. Now my question is when we apply gradient descent to set the values of theta which will result in best fit to data. Wouldn't it also reduce the values of all the thetas (which we don't want to reduce)? Because we are adding every value of theta at the end of the cost function. Which will result in a very low slope linear line causing underfitting.

asked Apr 03 '21 at 07:15

Mauj Mishra

Does this answer your question? [Why does shrinkage work?](https://stats.stackexchange.com/questions/179864/why-does-shrinkage-work) – Arya McCarthy Apr 03 '21 at 15:12

When using regularization wouldnt it make all parameters very small?

0 Answers0