Consider a binary classification problem with a small dataset: 15 instances in class 0 and 15 instances in class 1 and with four features. So, the data matrix is of the size: 30 X 4.
I used a simple logistic regression with 10-fold stratified cross validation to learn a classifier and the resulting accuracy score is about 70% with (f1 ~ 0.72).
I was told, that my classification results do not make any sense, as the sample size (N=30) is too small to find any statistical significant difference between two groups. The explanation was based on a simple computation of the standard error, which in the Binomial approximation ( sqrt(p(1-p)/n) ) 1/(2 sqrt(30)) = ca 10%, which at the 5% significance level gives confidence regions of 40% width.
QUESTION
I am quite confused, as I do not see how to put together the classifier which is trained on the feature set and the estimation of the statistical significance with confidence bounds based on the Binomial distribution.
UPDATE
I understand, that a small sample size may affect the generalisation error, but I can easily assess the error bound of the classification results by performing a nested cross validation and compute the mean error and the standard error, which originates from difference CV splittings.
UPDATE 2
I found this post, which is closely related and there are quite interesting discusions and answers. It might of interest to readers.