How to use cross validation when you have missing data & rare events?

Asked Dec 18 '15 at 15:58

Active Dec 18 '15 at 16:51

Viewed 721 times

I am trying to use repeated cross validation to test my classifier. Moreover, I want to use imputation due to missing values and downsampling due to unbalanced data (I have 88% of my data in the positive class and 12% in the negative class). My approach is the following:

r <- 10 (repetitions)
k <- 10 (folds of CV)

for 1:r:
    assign folds/split data into train and test set
        for 1:k
            imputing train set and test set separately
            building classifier on downsampled train set
            predicting classes and comparing to test set
        end
    averaging to get cross-validated performance measures
end

Is this correct?

edited Dec 18 '15 at 16:51

gung - Reinstate Monica

132,789
81
357
650

asked Dec 18 '15 at 15:58

Patrick Balada

1

@ gung. No, that wasn't my intention. Sorry - my native language is not English and in my mother tongue that is used in colloquial speech. Thank you for the heads up! I changed it to valid. My question is just if this is a good approach or if something is completely wrong, like the order of the steps. – Patrick Balada Dec 18 '15 at 16:42
Exactly. 88% positive class (=1) and 12% negative class (=0). Thanks for the question. Sorry again for the obviously unclear question. – Patrick Balada Dec 18 '15 at 16:49

How to use cross validation when you have missing data & rare events?

0 Answers0