Questions tagged [cohens-kappa]

A measure of the degree to which 2 raters agree. There is also a test of inter-rater agreement based on kappa. Use [inter-rater] if you are interested in other aspects of IRA, but not this specific measure.

Cohen's kappa is a measurement of the degree to which 2 rating systems (typically people making ratings) agree. It adjusts for the probability that the raters would agree by chance alone if they were completely independent.

The sampling distribution of kappa is known, so the measure can be used as a statistical test of agreement.

The standard calculation of kappa is: $$ \kappa=\frac{p(\text{agreement})-p(\text{chance agreement})}{1-p(\text{chance agreement})} $$ There are also weighted versions of kappa, and other related measures of inter-rater agreement.

180 questions
169
votes
4 answers

Cohen's kappa in plain English

I am reading a data mining book and it mentioned the Kappa statistic as a means for evaluating the prediction performance of classifiers. However, I just can't understand this. I also checked Wikipedia but it didn't help too:…
Jack Twain
  • 7,781
  • 14
  • 48
  • 74
45
votes
1 answer

Computing Cohen's Kappa variance (and standard errors)

The Kappa ($\kappa$) statistic was introduced in 1960 by Cohen [1] to measure agreement between two raters. Its variance, however, had been a source of contradictions for quite a some time. My question is about which is the best variance…
Cesar
  • 984
  • 1
  • 9
  • 21
35
votes
3 answers

Classification/evaluation metrics for highly imbalanced data

I deal with a fraud detection (credit-scoring-like) problem. As such there is a highly imbalanced relation between fraudulent and non-fraudulent observations. http://blog.revolutionanalytics.com/2016/03/com_class_eval_metrics_r.html provides a great…
29
votes
2 answers

Inter-rater reliability for ordinal or interval data

Which inter-rater reliability methods are most appropriate for ordinal or interval data? I believe that "Joint probability of agreement" or "Kappa" are designed for nominal data. Whilst "Pearson" and "Spearman" can be used, they are mainly used for…
shadi
  • 497
  • 1
  • 4
  • 10
9
votes
2 answers

Inter-rater reliability with many non-overlapping raters

I have a data set of 11,000+ distinct items, each of which was classified on a nominal scale by at least 3 different raters on Amazon's Mechanical Turk. 88 different raters provided judgments for the task, and no one rater completed more about 800…
9
votes
1 answer

What is the intuition behind the Kappa statistical value in classification

I understand the formula behind the Kappa statistic value and how to calculate the O and E value from a confusion matrix. My question is what is the intuition behind this measure? Why does it work so well for a given data set and why is it a good…
London guy
  • 1,246
  • 2
  • 18
  • 25
8
votes
1 answer

Quadratic weighted kappa versus linear weighted kappa

When should I use quadratic weighted kappa or linear weighted kappa? I have two observers evaluating the classes of a number of objects. The classes are fail, pass1, pass2, and excellent (ordinal scale). The errors in classification between "fail"…
andreSmol
  • 487
  • 1
  • 6
  • 14
6
votes
1 answer

Fleiss kappa vs Cohen kappa

Can somebody explain in-detailed differences between Fleiss kappa and Cohen kappa? And how the metric works under the hood? When would one use Fleiss kappa over Cohen kappa? What are the advantages/disadvantages of using Fleiss kappa over Cohen…
Pluviophile
  • 2,381
  • 8
  • 18
  • 45
5
votes
1 answer

How should I interpret Fleiss' kappa when it equals NaN?

I noticed that when I have tables which the values are only 0 and 1, I get a kappa of 1 when the table is completely full of one, and when I have a table of zeros I get NaN as result using the irr package and the kappa fleiss function. I would…
Oeufcoque Penteano
  • 756
  • 1
  • 12
  • 23
5
votes
1 answer

Is the power for a Kappa test the same as underlying z-test?

The Kappa ($\kappa$) test is a Z-test kind of test. If I am not very wrong, to compute the $\kappa$ test, we can just estimate the appropriate variance $\hat {var}(\hat\kappa)$ for the kappa statistic $\hat\kappa$ and then feed it to a z-test by…
Cesar
  • 984
  • 1
  • 9
  • 21
5
votes
1 answer

Using Cohen's kappa statistic for evaluating a binary classifier

I am using the caret package to perform predictive modeling on a binary target variable. The outcome is very unbalanced so it is suggested to use the Kappa statistics to evaluate the binary classifier. I am trying to evaluate the performance of…
Giorgio Spedicato
  • 3,444
  • 4
  • 29
  • 39
5
votes
1 answer

Get a 95% confidence interval for Cohen's Kappa in R

I've got a study with two radiologist's reading chest x-rays and want to calculate Cohen's kappa of their agreement. The kappa2 function in the "irr" package and cohen.kappa in "psych" can both give me an answer but don't generate a 95% confidence…
5
votes
1 answer

Adjusting kappa inter-rater agreement for prevalence

I am trying to calculate kappa scores for present/absent decisions made by two raters and I have heard that they can be adjusted for prevalence of the object of measurement. Can anyone advise on how to calculate a kappa statistic that is adjusted…
michelle
  • 93
  • 1
  • 3
5
votes
1 answer

Fisher's exact test vs kappa analysis

I was reading a paper where the authors assessed the association between two different diagnostics tests intended to diagnose the same disease and they performed the analysis with Fisher's exact test. While I find this statistically appropriate I…
4
votes
2 answers

Sample size needed for Fleiss' Kappa?

A group of raters (about 20) will be watching a series of videos and will be classifying them into 4 categories. I will be running a Fleiss' kappa to measure the agreement. How does one compute for the sample size to arrive at 0.8 power, 0.05 alpha?…
ome
  • 91
  • 1
  • 5
1
2 3
11 12