I would like to do the following:
- Train a classifier on a certain dataset
- Test the classifier on a certain test set
- Compute the test error and standard deviation
- Compute a 95% confidence interval for the true error
I have a training set X_train
with 1000 training examples, each having 15 features. The true labels are in Y_train
. I also have a test set of X_test
, with the true labels in Y_test
.
So far, I have come up with the following code:
scores = np.zeros(1000)
clf = RandomForestClassifier(criterion='entropy')
clf.fit(X_train, Y_train)
for i in range(1000):
score = clf.score(X_test[i], [Y_test[i]])
scores[i] = score
The above code fits the model with the training set and then uses the clf.score
method on every test example separately. Consequently, the scores
array is a binary array, because each time a singleton is tested and this is either correctly or incorrectly classified. Next, I compute the test error and standard deviation like this:
ctr = 0
for i in scores:
if i == 0:
ctr += 1
test_error = ctr/1000.0
std = scores.std()
I assume the data is approximated by a Normal Distribution since I have 1000 training examples. Then, I compute the 95% confidence interval for the true error like this:
med = test_error
low = test_error - 1.645 * math.sqrt(std)
high = test_error + 1.645 * math.sqrt(std)
My question is: is this a correct way of computing the test error and the 95% confidence interval?