The result that I'm talking about is the mean of cross validation from the function of sklearn. I tried to shuffle my training data then applied the CV function (shuffle then CV). I did that for several times and each result was different, some are higher than the others. Is the arrangement of data from the training data relevant? If so, then how?
Additions: Sorry I didn't define my problem well. I'm doing a supervised binary classification. My train variable consists of the feature and classification. Here is part of my code:
import numpy as np
from sklearn import cross_validation
from sklearn.linear_model import LogisticRegression
np.random.shuffle(train)
clf = LogisticRegression()
clf = clf.fit(train[0::,1::],train[0::,0])
mean = cross_validation.cross_val_score(clf, train[0::,1::],train[0::,0], cv=cross_validation.StratifiedKFold(train[:,0],5)).mean()
print mean
#sample for the train:
#train = [[1,0.5,0.3,0.6],[0,0.3,0.2,0.1],[0,0.1,0.9,0.7]]
Here I have n_fold=5. If I remove the np.random.shuffle(train) my result for the mean is approximately 66% and it stays the same even after running the program a couple of times. However, if I include the shuffle part, my mean changes (sometimes it increases and sometimes it decreases). And my question is, why does shuffling my training data changes my mean? If the order of data in my 'train' is irrelevant, then why does my mean changes after shuffling? This is just part of my code so assume that train variable is defined somewhere. My train variable is similar to the sample, but it consists of a lot more feature.