T tests on proportions - Wrong, but how wrong?

Question

Background: In psychology, and probably a number of other disciplines, it's common practice to test between-groups effects on a binary variable, such as accuracy, by aggregating data within participants, and then running a t test on the aggregates.

aggregate.data = data %>%
                  group_by(subject_nr, condition) %>%
                  summarise(accuracy=mean(accuracy))

t.test(aggregate.data[aggregate.data$condition==0,]$accuracy,
       aggregate.data[aggregate.data$condition==1,]$accuracy,
       paired=T)

We all know at this stage that it's wrong to analyse proportion data like this using t tests/ANOVA. Researcher should at least by applying the arcsine transform to normalize the data (which I've never seen in a psychology journal), but ideally should use multilevel logistic regression

glmer(accuracy ~ condition + (1|subject_nr), data=data, family=binomial)

By way of an example, I was just reading this study, with 61 participants, which reports,

A large decrease in the proportion of base-rate responses was evident for incongruent relative to congruent items, t(60) = 11.66, SE = .04, p < .001, d = 1.49.

Question: We all know this is bad practice, but how bad of a practice is it?

It's hard to know if the matter is a minor statistical squabble, a problem for t tests which are only just significant (say p > .01), or something which casts doubt on the results of thousands of studies.

More practically, analyzing my own data, while I know that the logistic mixed model is the right tool for the job, I see untransformed t tests used in all of the top journals. Am I actually hurting my chances of publication by using the less well-known analysis?

I think the problem might be many medical journals or psychology journals do not invite statisticians to review their papers. Therefore, many papers with inappropriate statistical methods can be published. — Deep North, Jun 30 '15 at 13:56
Part of the answer is available in a thread [comparing the t-test to logistic regression](http://stats.stackexchange.com/questions/159110). It sheds some light on when the "ideal" approach might be preferred and when it should not (because it may be a less powerful way to achieve the study objectives). Even use of the arcsine transform is questionable when the groups have different sizes. This leads me to ask, *to what are you referring by "bad practice"?* The t-test? The arcsine? The logistic regression? All three? Or maybe a policy of using one procedure regardless of the nature of the data? — whuber, Jun 30 '15 at 14:06
From my limited understanding (i.e. [this paper](http://www.esajournals.org/doi/abs/10.1890/10-0340.1)), untransformed t test < arcine < multilevel logistic model. — Eoin, Jun 30 '15 at 14:44
In the linked question, the analysis is a little different, in that they're not talking about analysing proportions with a t test, but using the binary outcome as predictors in one. See my edit to the question. — Eoin, Jun 30 '15 at 14:46
There are some old simulation studies from the 1950's showing ANOVA on binary data works OK. Of course, it probably depends on the proportions and sample size. — David Lane, Mar 19 '17 at 18:23
Does this answer your question? [Why use a z test rather than a t test with proportional data?](https://stats.stackexchange.com/questions/90893/why-use-a-z-test-rather-than-a-t-test-with-proportional-data) — kjetil b halvorsen, Apr 29 '20 at 00:25

T tests on proportions - Wrong, but how wrong?

0 Answers0