Does the effect size mean the expected difference between the means

Question

I am designing an experiment and want to define the sample size. To identify this, I am setting my significance level to 0.05 and the power to 0.8. My alternative hypotheses says the two means should be at least 15% different from each other. Does this 15% correspond to the effect size? I had a look at this and this, but I am still not sure how to interpret this 15% I am setting my alternative hypothesis in the context of power analysis.

I tried to calculate the sample size based on the assumption that the 15 % is the effect size:

from statsmodels.stats.power import TTestIndPower
# parameters for power analysis
effect = 0.15
alpha = 0.05
power = 0.9
# perform power analysis
analysis = TTestIndPower()
result = analysis.solve_power(effect, power=power, nobs1=None, ratio=1.0, alpha=alpha)
print('Sample Size: %.3f' % result)

The output is Sample Size: 934.954

This does not seem reasonable. I am not sure if I a doing it the right way.

Can someone help here?

the_scheining · Accepted Answer · 2018-09-16T09:35:59.627

2

I think you're referring to measuring effect size in Cohen's D. If so, the effect size would be the difference in means, divided by the pooled standard deviation.

So, let's say you're looking at groups' comprehension scores on a test. Let's say your control group scores an average of 50, standard deviation 10. And let's say your treatment group scores an average of 57.5, standard deviation 5.

Then the Cohen's D would be = (57.5-50) / ((10+5)/2) = 1

(Note that this assumes equal sized control and treatment groups. Deriving the pooled standard deviation is actually a bit more complicated than averaging the two groups' standard deviations, if the two groups have different sample sizes. See wikipedia page on Cohen's d, for example.)

In your code, 0.15 refers to the minimum effect size (in Cohen's D) that you want to be powered to detect. What this means for your data depends on your samples' standard deviation(s). You haven't collected the data yet, so you'll have to make assumptions here based on similar past experiments, the literature, your judgement, etc. Researchers often run multiple power calculations using different assumptions and create a table that shows the various sample sizes required given these different assumptions. For example, you could look at the sample sizes needed given various effect sizes, and/or given various standard deviation(s).

edited Sep 16 '18 at 09:35

answered Sep 15 '18 at 11:55

the_scheining

216
1
8

Does that mean the effete size should be measured based on Cohen's D? Then what does the 15% represent in this context? – owise Sep 15 '18 at 13:30
Yes, you can measure effect size in Cohen's D. In my example, 15% was the difference between treatment (mean = 57.5) and control (mean = 50), where 7.5/50 = 15% difference between the groups. When doing a power calculation, you need to specify the groups' means in units AND the standard deviation(s) of the groups. Or you can use Cohen's D, where you don't have to specify the standard deviation(s). Does that make sense? – the_scheining Sep 15 '18 at 19:05
In your code, 0.15 almost definitely represents Cohen's D. What this 'means' for your data depends on your standard deviation(s). So if we slightly tweak my example of a comprehension test. For simplicity, let's say the standard deviation of both groups is 10. Cohen's D of 0.15 means an effect size (in correctly answered questions) of 1.5 questions. i.e. if your control group gets a score of 50 on average, your code is asking what sample size you would need to detect a treatment that raises their score by 1.5 points (where standard deviation is 10 for both groups). – the_scheining Sep 15 '18 at 19:11
It occurs to me that I've assumed you're talking about achieving a 15% difference in means between groups on some continuous variable. Please confirm that this is the case? The power calculation - and my comments - change if we're talking about a binary variable, i.e. 50% of Group A does a thing, whereas 57.5% of Group B does the thing. In that case you need to use a proportion test not a t-test. – the_scheining Sep 15 '18 at 19:31
It is a continuous variable. All what you said does make sense if we are measuring the power, but in this case I want to identify the sample size before conducting the experiment, that is why we don't can not calculate the standard deviation and thus we can measure the effect size in Cohen's D. That is what I could not understand. What would be the minimum sample size given that the difference between the means is expected to be 15% difference and the significance level is 0.05? – owise Sep 15 '18 at 23:51
The output of your power calculation is the sample size needed to be sufficiently powered to detect the effect size you specified (0.15). If you expect a 15% difference between means, then I recommend you convert that figure into the absolute difference in means (in whatever units are relevant - in my example, the units were 'correctly answered questions) - and then divide by the expected standard deviation. That will give your expected effect size in Cohen's D, which you can use in your code. – the_scheining Sep 16 '18 at 09:28
we can not calculate the standard deviation before we conduct the experiment and thus we can not measure the effect size in Cohen's D. How would we calculate the expected standard deviation? – owise Sep 16 '18 at 18:00
Give some more details of the experiment, please. Basically, you need to make some assumptions about your data, even if you haven't collected it yet, to calculate the sample size necessary to achieve a specific level of power. – the_scheining Sep 16 '18 at 20:36

score 2 · Answer 2 · edited Sep 15 '18 at 17:05

2

Different power analysis programs will allow different measures of effect size for different types of statistical analyses. "Means will very by 15%" is certainly an effect size, but it may not be usable by your software, in which case you might need to convert it to something else.

EDIT In your initial sentence you say you set power to 0.8 but your code has power at 0.9.

Also, you have effect = 0.15. What does that mean? You need to figure that out from the documentation for TTestIndPower. But it cannot be a % difference.

edited Sep 15 '18 at 17:05

kjetil b halvorsen

63,378
26
142
467

answered Sep 15 '18 at 12:08

Peter Flom

94,055
35
143
276

I added my python code to the question. What do you mean by it might not be useable by the software? – owise Sep 15 '18 at 13:35
I mean that each type of power analysis software (R packages, SAS PROC POWER, G*Power, whatever) will want to have the input in a certain form. – Peter Flom Sep 15 '18 at 13:40

Does the effect size mean the expected difference between the means

2 Answers2