Equivalent of paired t-test but for categorical data

Question

I'm attempting to analyze survey results that asked respondents to indicate whether a statement was true for them at the start of their career vs. now. They could also select both at the start of their career and now.

For example, 43% of respondents indicated the statement "I feel confident in my career development opportunities" was true at the start of their career and 57% indicate it's true now.

Since respondents could select both options, here is a breakdown of that:

Indicated statement was true ONLY at the start of their career: 311 Indicated statement was true ONLY now: 183 Indicated statement was true at the start of their career AND now: 199

I want to know if this increase (43% vs. 57%) is statistically significant but can't figure out which test to carry out given that the data is categorical. I thought maybe McNemar's Test but it seems that's only for 2X2 contingency tables.

Presumably you also have the count of those who never said it was true so you do have a contingency table. — mdewey, Oct 30 '20 at 16:24

gung - Reinstate Monica · Answer 1 · 2020-10-30T20:06:30.170

You want McNemar's test. I discuss it rather extensively here and here. Briefly, you can form a contingency table of the four combinations ('yes' both before and after, 'yes' before but 'no' after, 'no' before but 'yes' after, and 'no' both times). If you sum across the rows / columns, you can compute the marginal proportions. Your scientific question is if the marginal proportions are the same. However, note that the proportions share the information on the main diagonal (the cases where the responses are the same. Thus, you want to conduct a binomial test of the off diagonal elements against a probability of $.5$.

Given your specific numbers, you can readily compute the test, as the 'no' $\rightarrow$ 'no' count is irrelevant. Below, I compute the exact version of the test in R:

binom.test(311, (311+199), p=.5)
#   Exact binomial test
# 
# data:  311 and (311 + 199)
# number of successes = 311, number of trials = 510, p-value = 8.053e-07
# alternative hypothesis: true probability of success is not equal to 0.5
# 95 percent confidence interval:
#  0.5659515 0.6523759
# sample estimates:
# probability of success 
#              0.6098039

However, you can also estimate the final count, if you want. There is a mismatch between the stated percentages and the stated numbers. I will assume the $311$ is for only now. By convention, the four cell counts are noted $a - d$, and the total count for the table is $N$. There was presumably some rounding error in computing the proportions, but $382$ is $43\%$ of $888$, and $510$ is $57\%$ of $895$. So the original count of 'no' $\rightarrow$ 'no's may have been $891-(199+183+311) \Rightarrow 891-699 = 192$.

          now
start    T   F  sum   %
    T  199 183  382  43
    F  311   d  c+d  57
  Sum  510 b+d    N
    %   57  43

Equivalent of paired t-test but for categorical data

1 Answers1