I am using cross_val_predict
to generate cross-validated estimates using Ridge Regression:
reg = linear_model.Ridge(alpha = .5)
pred_r = cross_val_predict(reg, X, y, cv=None)
Based on this, the correlation between the predicted y and the real y is (0.114601783602, 0.00312638915351)
.
However, when I use RidgeCV
instead:
reg = linear_model.RidgeCV(alphas=[0.1, 1.0, 10.0], cv=10, fit_intercept=True, scoring=None, normalize=False)
pred_r = reg.predict(X)
I get a relatively very high correlation: (0.330446577353, 2.3472470222e-18)
Why do I get so different results? I though these two analysis should generate the same output. Any ideas? Is the way I use RidgeCV correct and valid? Also, since I have around 600 sample, I believe it would be reasonable not to divide the data into training and test sets, and just do CV. I am double-checking this since the results might be published in a journal.