I'm struggling a little bit with comparing these two classification methods. Although I know it is often a bad idea to use stepwise logistic-regression, I still want to perform it and analyse the difference. I had different approaches in mind. My data set contains about 2500 observations and 40 feature variables.
Split data randomly into training testing set. For example 80%/20 and run a classification tree and stepwise logistic regression (using different information criteria) on the training set and then evaluate it on the test set
However, since the size of the trees and the number of feature variables selected by the stepwise regression vary, I thought, it would be a good idea to run cross-validation. However, this is kind of tricky to me. Let's say I try to run a 5-fold CV on my 80% training data. I can evaluate my models within the cross-validation and get for example averaged accuracy and other performance measures for the different models (classification tree and logistic regression). But, how can I use that since I still want to evaluate the test model?
Use all my data to run cross-validation and then take average performance measures as final results to interpret.
Are these legitimate approaches? Or at least some of them? What would you recommend? Thank you in advance for your help!