A group of raters (about 20) will be watching a series of videos and will be classifying them into 4 categories. I will be running a Fleiss' kappa to measure the agreement. How does one compute for the sample size to arrive at 0.8 power, 0.05 alpha? Also, will that sample size be the number of videos to be evaluated?
Asked
Active
Viewed 2,941 times
4
-
2It's a bit unusual to determine if kappa is statistically significant, because that just tells you if it's different from zero. Usually you want kappa to be large (ish), not just larger than zero. – Jeremy Miles May 13 '14 at 00:13
-
If you have to do a significance test, compare the value to a sufficiently large value. For example, if minimum acceptable kappa is .70, you can test to see if the value is significantly higher than .70. – Hotaka Nov 25 '15 at 20:10
2 Answers
1
The paper by Cantor available here and entitled sample size calculations for Cohen's kappa may be a useful starting point. It seems to be widely available on the web if that link fails. But note @Jeremy has wisely pointed out in a comment that the hypothesis that $\kappa = 0$ is rarely of interest.

mdewey
- 16,541
- 22
- 30
- 57
0
I am not sure you can relate the power and the significance level with the Fleiss' kappa but:
I have demonstrated the sample size based on several values of p and q, the probabilities needed to calculate kappa (for the case of several categories), making scenarios by amount of classification errors made by the appraisals.
Yes, the sample size you'll obtain will be the number of videos to be evaluated.

Nunche
- 1
-
2Welcome to the site. You state that you have demonstrated certain thing, can you give more details, or a reference? – mpiktas Dec 06 '13 at 07:15
-
1Hi, thanks! Yes, I know 2 cases for which you can use Fleiss Kappa statistic: 1) For 1 appraiser vs. another (appraisers must categorize the samples into 2 categories, for example: good or bad). This case can also be used to compare 1 appraisal vs. the known standard; and 2) For several appraisers categorizing several categories, for example: 3 appraisals categorizing 50 units into 5 types of defects. Source: Statistical methods for rates and proportions. 3rd Edition. Joseph L. Fleiss. Publisher: Wiley. – Nunche Dec 14 '13 at 03:51
-
You get p and q from the contingency table that applies for either case 1 or 2. And with them, you calculate overall Kappa and a Kappa for each of the categories. – Nunche Dec 14 '13 at 03:52
-
You can calculate n from given values of: the observed probability Pobs, the probability due to randomness P-chance (which is removed by the kappa statistic calculation itself) and the kappa standard deviation. So, I gave values to these variables, I tabulated the values and calculated n and that's how I got the table for n. – Nunche Dec 14 '13 at 03:58
-
1Can you provide a reference or example calculation to answer the question? – prince_of_pears Oct 05 '16 at 15:48