Is Bonferroni correction needed for paired t-test on grouped data?

Question

I have a survey that was conducted in 3 different classes ( math, phys, bio) at the beginning and at the end of the semester ( pre and post). In the survey, there were 3 groups of questions (A, B, C) and a Likert-type scale. I converted all the answer into the numerical score

I have the following dataset with the following columns: Subject: math, physics, bio Survey: pre and post q: A, B, and C Score: values ranging 1-7

I want to test for each course for each question type whether the is a difference in score between pre and post-term survey. I conducted paired t-test but my question is do I need a Bonferroni correction here?

Here is some simulated data and the code for paired t-test:

df= data.frame(
survey = rep(c("pre","post"),60),
subject = rep(c("bio", "math", "phys"),40),
q = rep(c("A", "B", "C"),40),
score =  sample(x=1:7, size = 120, replace = TRUE))
df

df %>%  group_by(subject, q) %>% 
  t_test(score ~ survey, paired = TRUE, detailed = TRUE) %>%
  add_significance()

See https://stats.stackexchange.com/questions/120362/whats-wrong-with-bonferroni-adjustments. I suggest only using the Bonferroni correction for political reasons, e.g. to satisfy reviewers who believe it's a good idea. — fblundun, Feb 24 '21 at 17:30

score 1 · Accepted Answer · answered Feb 24 '21 at 18:38

The short answer: "It Depends".

The longer answer:

Whether to adjust for multiple comparisons is a very good question without a good/standard answer. It really does depend on things other than the data and basic questions.

I like the approach of thinking through how it would change my interpretation of the results if I were to add an additional 20 (or 100 or other large number) of categories/tests/intervals based on random data to my process and either adjust for the multiple comparisons or not adjust.

Think about for your case if you added a bunch more categories and filled in the pre and post measurements with purely random data.

Scenario 1: You plan to declare "Success", publish, change policy, etc. if any of the tests are significant. So adding 20 or more tests on random data without adjusting for multiple comparisons greatly increases your chance of seeing at least one significant test (but adjusting for all the tests does not).

Scenario 2: You are really interested in the "Math" category and whether "Physics" shows a difference or not will not affect your interest in the math class. But adding 20 or more additional tests on random data and adjusting for multiple comparisons will increase the C.I. interval width on math giving less precision.

In scenario 1, definitely adjust. In scenario 2, definitely do not adjust. But most real scenarios fall somewhere in between, so you really need to think about what effect adjusting or not will have on your decisions.

Note that there are other options that Bonferroni. If you are really interested in this then I would look at Bayesian Hierarchical models. Instead of just increasing the standard error because some effect sizes may be too extreme by chance, they actually shrink the extreme effect sizes using information from the other groups giving better effect size estimates (and intervals) as well as allowing for "Partial Correction" instead of just a yes/no.

thank you so much for your detailed response! I wonder what do you mean by "Think about for your case if you added a bunch more categories and filled in the pre and post measurements with purely random data." Do you mean if I add more Subjects, for instance, or just more observation to already existing groups? — yuliaUU, Feb 24 '21 at 19:27
@yuliaUU, you already have subjects "bio", "math", and "phys"; think of adding "eng", "hist", "chem", "geol", etc. but with fully random data (you used random data in your example above, but presumably you plan to have real data for your analysis). Since this new data for new subjects is completely random it contains no new or useful information, but could change your results in scenario 1 if you don't adjust or change your results in scenario 2 if you do adjust. — Greg Snow, Feb 24 '21 at 21:41
so using the approach you provided in your answer, am I right if my hypotheses for each subject will be smth like this: In physics, there is no difference between pre and post-term scores for A. then by adding another discipline, I am not testing the same hypothesis: it for bio the Ho will be In bio, there is no difference between pre and post-term scores for A. then I wont need corrections for my p values. Sorry that I am asking so many questions. I tried to read some literature sources, but it is a bit hard to understand as they use too many jargon — yuliaUU, Feb 25 '21 at 17:50
@yuliaUU, One simple approach is to present the results both unadjusted and adjusted and let the reader decide which to use. — Greg Snow, Feb 26 '21 at 21:06
@yuliaUU, I like a couple of articles from the Lancet 2005: https://www.sciencedirect.com/science/article/pii/S0140673605664616?via%3Dihub, https://www.thelancet.com/journals/lancet/article/PIIS0140673605665166/fulltext — Greg Snow, Feb 26 '21 at 21:16

Is Bonferroni correction needed for paired t-test on grouped data?

1 Answers1