I have a cancer classification problem (type A vs type B) on radiological images from which i have generated 756 texture-based predictive features (wavelet transform followed by texture analysis, i.e., features described by Haralick, Amasadun etc) and 8 semantic features based on subjective assessment by expert radiologist. This is entirely for research and publication to show that these predictive features may be useful in this particular problem. I do not intend to deploy the model for practitioners.
I have 107 cases. 60% cases are type A and 40% type B (in keeping with their natural proportions in population). I have done several iterations of model development with varying results. One particular method is giving me an 80% 80% classification accuracy but I am suspicious that my method is not going to stand critical analysis. I am going to outline my method and a few alternatives. I will be grateful if someone can pick if it is flawed. I have used R for this:
Step 1: Split into 71 training and 36 test cases.
Step 2: remove correlated features from training dataset (766 -> 240) using findcorrelation function in R (caret package)
Step 3: rank training data features using Gini index (Corelearn package)
Step 4: Train multivariate logistic regression models on top 10 ranked features using subsets of sizes 3 , 4, 5 ,and 6 in all possible combination (10C3=252, 10C4=504, 10C5=630). So total 1386 multivariate logistic regression models were trained using 10-fold cross-validation and tested on test dataset.
Step 5: Of these I selected a model which gave the best combination of training and test dataset accuracy, i.e., 3 feature model with 80% 80% accuracy.
Somehow running 1300 permutations seems quite dodgy to me and seems to have introduced some false discovery. Just want to confirm if this is a valid ML technique or whether I should skip step 4 and only train on top 5 ranked features without running and permutations.
Thanks.
PS I experiemented a bit with naive bayes and random forests but get rubbish test set accuracy so dropped them
====================
UPDATE
Following discussion with SO members, i have changed the model drastically and thus moved more recent questions regarding model optimisation into a new post is my LASSO regularised classification method correct?