2

I am trying to use repeated cross validation to test my classifier. Moreover, I want to use imputation due to missing values and downsampling due to unbalanced data (I have 88% of my data in the positive class and 12% in the negative class). My approach is the following:

r <- 10 (repetitions)
k <- 10 (folds of CV)

for 1:r:
    assign folds/split data into train and test set
        for 1:k
            imputing train set and test set separately
            building classifier on downsampled train set
            predicting classes and comparing to test set
        end
    averaging to get cross-validated performance measures
end

Is this correct?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Patrick Balada
  • 609
  • 9
  • 16
  • 1
    @ gung. No, that wasn't my intention. Sorry - my native language is not English and in my mother tongue that is used in colloquial speech. Thank you for the heads up! I changed it to valid. My question is just if this is a good approach or if something is completely wrong, like the order of the steps. – Patrick Balada Dec 18 '15 at 16:42
  • Exactly. 88% positive class (=1) and 12% negative class (=0). Thanks for the question. Sorry again for the obviously unclear question. – Patrick Balada Dec 18 '15 at 16:49

0 Answers0