In the context of binary classification how do you interpret ROC curve: more precisely:
1) Why the diagonal stand for a random classifier?
[Edit] Let's imagine a random classifier: each time he see an observation he labels this observation to 1 with a probability of 0.5.
So with a lot of observations,
- among the observation with true label 1 half of them will be correctly classified.
- In the same way there is a 0.5 false positive rate (among the true label 0 half of them will be classified 1).
So on ROC curve this classifier will be on the diagonal on the point (0.5,0.5). But I can only see this case unless i don't understand well the meaning of random classifier in this context...
2) Why ROC curve is insensitive to Class skew ? Why ROC curve is insensitive to class distributions or error costs.
Let's imagine we have a sample of observation and we draw the ROC curve. Now let's add a lot of observation wich are labelled 0, does the ROC curve stay the same ? what does it mean and how to explain this ?
Her I saw an explanation: https://www.quora.com/Why-is-AUC-Area-under-ROC-insensitive-to-class-distribution-changes; the answers seems good if we assume that when we increase the number of negative sample they should be distributed with the same score distribution that the previous neagative samples.
PS: A mathematical paper (with proof) could help me to well understand the previous assertion since it will be formally defined. I found this one (in my language) but i am not sur it is correct since there seem to have a mistake since the begining with the RECALL definition: http://www.xavierdupre.fr/app/mlstatpy/helpsphinx/c_metric/roc.html. So if you have any reference please share it. I