0

I am a beginner in machine learning. I have a question why adding more samples to a data set is equal to adding regularization term to the ordinary least squares loss function? (In other words why can I add more samples to my data set and solve OLS instead of solving ridge regression?)

Alexis
  • 26,219
  • 5
  • 78
  • 131
soroush
  • 11
  • 4

1 Answers1

0

Edited:

There is an equality: check out @whuber's answer.

Originally, I understood your statement as: "Adding/Collecting more data (i.e. real data) is equivalent to ridge regression". And, in the following answer, I tried to address this. I'm leaving it for anyone who has confused the question like me.

Their service to regression mechanism resembles each other. If you have little data, it is easier to overfit it. Adding more data can increase your generalization performance directly. Regularization in its purest form tries to achieve this same purpose.

gunes
  • 49,700
  • 3
  • 39
  • 75
  • Contrary to your initial assertion, my answer in the duplicate thread explicitly shows there *is* a mathematical equivalence. I don't believe we have a difference of opinion but only a difference in how we understand the current question. – whuber Oct 02 '18 at 13:46
  • I believe there is a mathematically provable because it is my ML class question. – soroush Oct 02 '18 at 13:53
  • 1
    Oh, right! I got the question wrong way. Ridge regression can be written as if we add data; but I thought of it like collection of new/real data for your model is equivalent to ridge regression. Should I delete this or leave it?? – gunes Oct 02 '18 at 13:57
  • 1
    Your answer is not completely wrong, you can edit it. Actually, I am looking for mathematical proof and that exact data to be added. – soroush Oct 02 '18 at 14:08
  • 1
    A good way to prevent such confusion is to state--usually at the outset of your response--how you understand the question. That habit has spared me a lot of grief over the years :-). – whuber Oct 02 '18 at 14:18