How to perform multiple tests of contingency tables with more power: Fisher's / Barnard's test vs Logistic regression

Question

I have a sample of answers to Yes / No questions: \begin{array} {c | c c c c} \hline \text{Case} & X_1 & X_2 & X_3 & X_4 \\ \hline 1 & 1 & 1 & 1 & 1 \\ 2 & 0 & 1 & 0 & 0 \\ . & . & . & . & . \\ . & . & . & . & . \\ 40 & 1 & 1 & 0 & 1 \\ \end{array} from, we assume randomly chosen, $n=40$ medical institutions from population of $N = 250$ institutions.

Out of ${4\choose 2}=6$ possible dependencies between two variables, I am interested in testing 3 hypotheses:

$H_0:$ $X_2$ and $X_1$ are independent;
$H_0:$ $X_3$ and $X_1$ are independent;
$H_0:$ $X_4$ and $X_2$ are independent.

First question: I do not know if we should treat this data as paired?

Second question: Could you please recommend me some good approach to test this hypotheses?

I considered the following:

Using Fisher's exact test where there are no fixed margins can lead to lower power comparing to Barnard's test; as described in this book. So I would rather not perform Fisher's exact test if better approach is possible.
Could I use Barnard's test to test independence? If yes, can we assume pooled variance?
Since all the data are binary and there are overlapping variables, could I just use logistic regression; eg. in R:

 glm(X4~X2+X1, data = data1, family = binomial)
 glm(X3~X1, data = data1, family = binomial)

where this is the only meaningful direction of dependance.

Similar questions have been asked before: https://stats.stackexchange.com/questions/125985/why-do-p-values-for-test-of-likelihood-ratio-vs-fishers-exact-test-not-agree, and see all the links in comments at https://stats.stackexchange.com/questions/438188/how-can-i-calculate-the-power-of-my-analysis-with-binary-response-data?noredirect=1#comment816376_438188 — kjetil b halvorsen, Nov 27 '19 at 15:33
I read these topics, still I am not sure what to use and whether this data is considered paired? If I am only interested in testing hypothesis that two variables are independent, I could use Fisher's / Barnard's test, but only if the data is not paired? — Tjaša Kovačević, Dec 01 '19 at 10:19

score 2 · Accepted Answer · answered Dec 03 '19 at 20:42

Fisher's exact test assumes fixed (conditioned) row and column totals. It looks like the total number of 1's for your dataset is not fixed and could have been anything from 0 to 40. In other words, you do not know in advance the margins of your contingency table to which you will apply the test:

\begin{array}{l | ll | l} & X_{2} = 0 & X_2 = 1 & \text{Total} \\ \hline X_1 = 0 & \text{...} & \text{...} & \text{?} \\ X_1 = 1 & \text{...} & \text{...} & \text{?} \\ \hline \text{Total} & \text{?} & \text{?} & 40 \end{array}

This means that both the row and column totals are unconditioned in your case. Therefore Fisher's exact test is not exact in your case and it is true that applying it means you have less power on average. Barnard's exact test and it's variant Boscholoo's test, do not condition on any margin, so you have more power using them on average. To help you decide between Bernard's and Boscholoo's test see this.

Regarding multiple testing, you could simply apply Bonferroni correction for multiple comparisons. This means with 3 pairwise comparisons, at P < 0.05 significance level, the P value must be less than 0.05/3 to be significant.

Regarding your first question. I don't see any matched pairs of subjects in this data. What I understand by paired data:

For example using some additional treatment information we can have contingency table:

\begin{array}{l | ll } & X_{\text{after}} = 0 & X_{\text{after}} = 1 \\ \hline X_{\text{before}} = 0 & \text{...} & \text{...} \\ X_{\text{before}} = 1 & \text{...} & \text{...} \\ \end{array}

Or by using some additional similarity information, something like this: \begin{array}{l | ll } & X_{\text{sibling}} = 0 & X_{\text{sibling}} = 1 \\ \hline X = 0 & \text{...} & \text{...} \\ X = 1 & \text{...} & \text{...} \\ \end{array}

For this kind of paired data we can use the McNemar's test.

In logistic regression example you probably meant:

glm(X1 ~ X2 + X3, data = data1, family = binomial)
glm(X4 ~ X2, data = data1, family = binomial)

Thank you for the answer. I understand now that data is unpaired. Sure, if I perform Fisher's or Barnard's test I should use correction for multiple testing. — Tjaša Kovačević, Dec 08 '19 at 17:55

How to perform multiple tests of contingency tables with more power: Fisher's / Barnard's test vs Logistic regression

1 Answers1