I'm studying machine learning and I feel there is a strong relationship between the concept of VC dimension and the more classical (statistical) concept of degrees of freedom.
Can anyone explain such a connection?
I'm studying machine learning and I feel there is a strong relationship between the concept of VC dimension and the more classical (statistical) concept of degrees of freedom.
Can anyone explain such a connection?
Yaser Abu-Mostafa --- Learning with data
Degrees of freedom are an abstraction of the effective number of parameters. The effective number is based on how many dichotomies one can get, rather than how many real-valued parameters are used. In the case of 2-dimensional perceptron, one can think of slope and intercept (plus a binary degree of freedom for which region goes to +1), or one can think of 3 parameters w_0,w_1,w_2 (though the weights can be simultaneously scaled up or down without affecting the resulting hypothesis). The degrees of freedom, however, are 3 because we have the flexibility to shatter 3 points, not because of one way or another of counting the number of parameters.