I was having a debate with co-workers today about the dependence of AUC on class imbalance, ie, the proportion of positive/negative instances in the response variable. It was suggested that when classes are highly imbalanced, the average AUROCC value for repeated random predictions would not necessarily be expected to converge on 0.5 in the limit.
For models fit using random data or using randomly permuted class labels, I would expect the class probabilities of each instance in all models to be uniformly distributed between zero and one, and consequently the AUC to vary between zero (when all positive samples are assigned a class probability of zero, and all negative samples non-zero) and one (when all positive samples are assigned a probability of one, and all negative samples less than one). Under this model, I would expect the AUC to be relatively uniformly distributed between zero and one. I am curious as to whether I am missing some aspect of this.
I would be grateful for any literature on the topic if this is not the case.