What are the strengths or weaknesses of using the higher Cohen's kappa score to select a classifier from a pool of candidates?
Back-story:
I had a pool of candidate classifiers (~200 candidate models), and I did the best I could to find the "strongest".
For each classifier I did this:
- compute true positive rate (estimated true / actual true)
- compute true negative rate (estimated false / actual false)
- multiply together to create a score
Then I ranked on scores and picked the one with the highest. It worked out okay, but I like to revisit past problems to find better tools. I'm always in the market for a better analytic tool.
Proposed approach:
If I have this data: $$ \begin{matrix} & Actual \, True & Actual \, False \\ Est \, True & 45 & 16\\ Est \, False & 25 & 14 \end{matrix} $$
and I compute Cohen's kappa of the estimator versus actual I get this:
$$ \kappa = \frac {p_o - p_e} {1 - p_e}$$ where $$ p_o = \frac {45 + 14} {45 + 16 + 25 + 14} = 0.59 $$ and $$ p_{True} = p_{true}|est \cdot p_{true}|actual \\ \, \\ \frac {45+16} {100} \cdot \frac {45+25} {100} = 0.427\\ \, \\ p_{False} = p_{false}|est \cdot p_{false}|actual \\ \, \\ \frac {25+14} {100} \cdot \frac {16+14} {100} = 0.117 $$ therefore $$ p_e = 0.427 + 0.117 = 0.544 $$
so the Cohen's Kappa for this is:
$$ \kappa = \frac {p_o - p_e} {1 - p_e} = \frac {0.59-0.544} {1-0.544} = 0.1008 $$
If I had an alternate, better, estimator that had the following confusion matrix: $$ \begin{matrix} & Actual \, True & Actual \, False \\ Est \, True & 45 & 16\\ Est \, False & 14 & 25 \end{matrix} $$
then the kappa is 0.375.
If these were the two classifiers to pick, I would be wise to prefer the 0.375.
Extended questions:
- Does this have a name outside of "using Cohen's kappa"? Is it equivalent to another, more well known and studied method? Is it textbook?
- Are there known problems with this approach? What are the weaknesses here?
UPDATE:
- I was using the knime "scorer" and noticed that Cohen's kappa is given as a measure of learner performance.