0

Ian Goodfellow in his book writes that When we use kernel trick to get an infinite-dimensional vector, we can always have enough capacity to fit the training set, but generalization to the test set often remains poor. Why does the generalization remain poor?

Aastha Dua
  • 105
  • 6
  • 2
    Hi: there might be special details because it's a kernel but, simply speaking it's due to over-fitting just like one can over-fit when building any model in machine learning-statistics. – mlofton Aug 11 '19 at 08:22

1 Answers1

0

Goodfellow's example seems like a specific example of the bias-variance tradeoff. Fitting training data is pretty easy with modern machine learning methods; generalizing to unseen data is much harder. Striking the balance is where most of the work happens.

We have some threads about overfitting which might help.

Sycorax
  • 76,417
  • 20
  • 189
  • 313