I wish to perform a binary classification task on a dataset I have gathered, with unbalanced classes (~55% for one class).
The relation/correspondence between the features and the classes I deal with is not established in the literature yet, so a model which is not completely useless is of some value.
Since the data is unbalanced, I employ Area Under ROC (AUROC) as a measure of fit. I use random forest in order to fit the model given the data, so the AUROC it obtains depends on the partition to train\test data. Therefore, I believe some version of boosting should be applied.
I would like to test the hypothesis H0 = "The AUROC is equal to 0.5", and if rejected conclude that my model is better than random guess.
So, my question is: if I do 1000 iterations of training\test partitions, train a model on the training data and calculate its AUROC on the test data, what significance test should I use (on the 1000 obtained AUROC scores) in order to show that it is significantly greater than 0.5?
Currently I use bootstraping to estimate the p-value, although I sure this is somewhat wrong (since the samples are not i.i.d.). I saw this post and also this one suggesting significance testing for AUROC, but I am still not sure those how these tests work with boosting.
Any Idea?