correct way to calculate sample size in behavioral experiment

Question

This question was asked many times before, but the typical answer is usually 'do a power sample calculations'.

The problem that I can't grasp is that in most well-known papers the samples that they use are ridiculously small. For instance, one of the most famous papers in behavioral economics ('Cooperation and Punishment in Public Goods Experiments', more than 3.600 citations in Google Scholar), uses 112 participants in total divided into several (5) sessions, groups of 4, and in some sessions there was a random matching (participants were randomly shuffled between rounds), so each group wasn't an independent observation anymore.

I believe I fundamentally misunderstand the whole concept of sample size: how such a small sample can be used for statistical analysis?

Perhaps it provides adequate power for the observed effect? Even sample sizes of one can be used for statistical analysis: see https://stats.stackexchange.com/a/1836/919. Also see Sharan B. Merriam, *What Can You Tell From an N of 1?: Issues of Validity and Reliability in Qualitative Research.* PAACE J. Lifelong Learning, Vol. 4, 1995, 51-60. — whuber, Jun 20 '17 at 18:51
well, but how can we know the observed effect _ex ante_? (Thank you for the reference to Merriam by the way!) — David Pekker, Jun 20 '17 at 19:03
The problem is you cannot know the effect size *before* the study, so you have to guess it in a power analysis. After the study, though, just consult the report! Large effect sizes are often obvious from their astronomically small p-values or absolutely large magnitude (relative to theoretical predictions or previous experience). — whuber, Jun 20 '17 at 19:06
Does it mean that we are good to do a kind of p-hacking (maybe not the most correct term)? I mean we don't know how large will be the difference between control and treatment before the experiment. We can do a pilot test with a small sample, see the difference, and then adjust our sample size to that extend that the difference becomes significant? (Sounds like not the most honest approach). — David Pekker, Jun 20 '17 at 19:22
That's not only perfectly honest, it's standard--provided you don't include the preliminary results in the final dataset. If you do want to include them, there are various adaptive and sequential methods that can be employed to account for the optional stopping in the experiment. This, however, doesn't seem to have much to do with your original question. — whuber, Jun 20 '17 at 19:24

correct way to calculate sample size in behavioral experiment

0 Answers0