1

Here is some simple code to illustrate my point. Using GridSearchCV with cv=2, cv=20, cv=50 etc makes no difference in the final scoring (48). Even if I use KFold with different values the accuracy is still the same. Even if I use svm instead of knn accuracy is always 49 no metter how many folds I specify. This is just a toy example, but in the case of big, real world data sets I face the same probem. cv parameter in GridSearchCV never makes any difference. Am I doing something wrong or is GridSearchCV 'broken'?

from sklearn import svm
from sklearn.neighbors import KNeighborsClassifier as knn
from sklearn import datasets
from sklearn.model_selection import GridSearchCV
from sklearn.utils import shuffle
from sklearn.model_selection import KFold

iris = datasets.load_iris()

X, y = shuffle(iris.data, iris.target, random_state=0)
X_train, y_train = X[:100], y[:100]
X_test,  y_test  = X[100:], y[100:]  

knn_param_grid = {'n_neighbors': [5, 2]}
# svm_param_grid = {'shrinking': [True, False]}
# cv = KFold(30)

grid_search = GridSearchCV(estimator=knn(), param_grid=knn_param_grid, cv=20)
grid_search.fit(X_train, y_train)

predictions = grid_search.predict(X_test)
print sum(predictions == y_test) 
# prints 48 for every value of cv
Mariusz
  • 171
  • 5
  • 1
    The problem is probably that you're using accuracy instead of a proper scoring rule. See the discussion here. http://stats.stackexchange.com/questions/256551/why-does-the-accuracy-not-change-when-applying-different-alpha-values-in-l2-reg/256554#256554 – Sycorax Feb 25 '17 at 17:09

1 Answers1

1

You should check whether the selected hyperparameters are always the same. If so, evaluating the performance of the fitted GridSearchCV object should always yield the same result.

Note that regardless of the number of folds used in the CV, you are always training your model over the entire training set. The fit method of GridSearchCV class first performs hyperparameter selection with CV and then fits the model with the selected parameters over the entire training set.

Daniel López
  • 5,164
  • 2
  • 21
  • 42