Ian Goodfellow in his book writes that When we use kernel trick to get an infinite-dimensional vector, we can always have enough capacity to fit the training set, but generalization to the test set often remains poor. Why does the generalization remain poor?
Asked
Active
Viewed 42 times
0
-
2Hi: there might be special details because it's a kernel but, simply speaking it's due to over-fitting just like one can over-fit when building any model in machine learning-statistics. – mlofton Aug 11 '19 at 08:22
1 Answers
0
Goodfellow's example seems like a specific example of the bias-variance tradeoff. Fitting training data is pretty easy with modern machine learning methods; generalizing to unseen data is much harder. Striking the balance is where most of the work happens.
We have some threads about overfitting which might help.

Sycorax
- 76,417
- 20
- 189
- 313