Background: In psychology, and probably a number of other disciplines, it's common practice to test between-groups effects on a binary variable, such as accuracy, by aggregating data within participants, and then running a t test on the aggregates.
aggregate.data = data %>%
group_by(subject_nr, condition) %>%
summarise(accuracy=mean(accuracy))
t.test(aggregate.data[aggregate.data$condition==0,]$accuracy,
aggregate.data[aggregate.data$condition==1,]$accuracy,
paired=T)
We all know at this stage that it's wrong to analyse proportion data like this using t tests/ANOVA. Researcher should at least by applying the arcsine transform to normalize the data (which I've never seen in a psychology journal), but ideally should use multilevel logistic regression
glmer(accuracy ~ condition + (1|subject_nr), data=data, family=binomial)
By way of an example, I was just reading this study, with 61 participants, which reports,
A large decrease in the proportion of base-rate responses was evident for incongruent relative to congruent items, t(60) = 11.66, SE = .04, p < .001, d = 1.49.
Question: We all know this is bad practice, but how bad of a practice is it?
It's hard to know if the matter is a minor statistical squabble, a problem for t tests which are only just significant (say p > .01), or something which casts doubt on the results of thousands of studies.
More practically, analyzing my own data, while I know that the logistic mixed model is the right tool for the job, I see untransformed t tests used in all of the top journals. Am I actually hurting my chances of publication by using the less well-known analysis?