I have an imbalanced dataset with 5 classes. The distribution is roughly:
Class 1: 0.5
Class 2: 0.25
Class 3: 0.15
Class 4: 0.05
Class 5: 0.05
I've been testing some different classification algorithms on this data set. To evaluate the algorithms, I've looked at overall accuracy and the kappa statistic (as well as the good ol' confusion matrix of course).
In almost every case, higher accuracy corresponds with a higher kappa score.
So, if I were to use accuracy to screen out classifiers instead of kappa, then I'd almost have the exact same results (i.e. I'd end up picking the same classifier).
Reading through Tom Fawcett's blog post, he says
Don’t use accuracy (or error rate) to evaluate your classifier!
and then goes on to explain that if you must use a single number estimator, then ROC, F1, and Kappa are all better replacements.
On my data set there's barely a difference. Just a feature of my data set? What type of data set would you expect to see a more distinct difference in the metric?