0

I've just tried to calculate Cohen's d in R with the following formula:

mean1 <- mean(Meaningfulness[df$age<=22],na.rm=TRUE)
mean2 <- mean(Meaningfulness[df age>=22],na.rm=TRUE)
sd1 <- sd(Meaningfulness[df$age<=22],na.rm=TRUE)**strong text**
sd2 <- sd(Meaningfulness[df$age>=22],na.rm=TRUE)

d <- (mean1-mean2)/(c(sd1,sd2))

Interestingly I get 2 outputs then? And my second confusion is - when I compare the results to an online "cohen's d calculator" with one set of data my calculated Cohen's d matches with the one from the online calculator with another set of data it doesn't.

Any suggestions on what might be wrong with my code? Every help is appreciated!! :)

Newby
  • 1
  • 1
  • 1
    Purely programming questions are better asked on StackOverflow. We focus on statistics questions here. But, the problem is that `c(sd1,sd2)` is not doing what you think it's doing. You'll need to calculate the pooled standard deviation and divide by that instead. (Or, divide by the standard deviation of the "control group", depending on the particular interpretation of cohen's d you want.) – David Luke Thiessen Aug 31 '21 at 15:24
  • Any reason you are concatenating `sd1` and `sd2` in a vector instead of e.g. taking their weighted mean? – B.Liu Aug 31 '21 at 15:24
  • Maybe it's also worth mentioning that the degrees of freedom in the T-test output are decimal numbers. I guess this could have to do with an unequal number of observations per group? Might this cause the 2 outputs and if so - which output would I rely on? – Newby Aug 31 '21 at 15:27
  • @DavidLukeThiessen - thanks for the hint! And sorry for putting this question on the wrong site then... I get what you're saying! Do you by chance know the code to pool the standard deviations? I actually assumed I'm doing just that :) – Newby Aug 31 '21 at 15:30
  • There are a two slightly different definitions of the pooled sd for cohen's d. I'd suggest reading through the answers to these two questions and perhaps the wikipedia article on effect sizes. You should find it pretty easy to write the code yourself once you see the formula. https://stats.stackexchange.com/questions/66956/whats-the-difference-between-hedges-g-and-cohens-d https://stats.stackexchange.com/questions/1850/difference-between-cohens-d-and-hedges-g-for-effect-size-metrics – David Luke Thiessen Aug 31 '21 at 15:46
  • Thx to your help @DavidLukeThiessen I updated the code as followed: mean1 =22],na.rm=TRUE) sd1 =22],na.rm=TRUE) n1 =22]) pooled – Newby Aug 31 '21 at 16:30
  • If the difference between your value and the online calculator is small I wouldn't worry too much about it. But I think that if you have `NA` values in `Meaningfulness` you should also remove them from the calculation of `n1` and `n2`. – David Luke Thiessen Aug 31 '21 at 16:53
  • Thank you @DavidLukeThiessen! – Newby Sep 01 '21 at 09:35

2 Answers2

1

I'm pretty sure there are packages for ES's which includes Cohen's D. I would look at what David said, either your idea of the formula is wrong or your code isn't doing what you want it to. In Jacob Cohen's original book Statistical Power Analysis for the Behavioral Sciences, your denominator is the "common within-population standard deviation" or pooled SD. Here's a useful source for coding ES https://bookdown.org/MathiasHarrer/Doing_Meta_Analysis_in_R/effects.html#effect-sizes-in-control-group-designs

Dylan A
  • 31
  • 5
0

Thanks for all your help! Thanks to your advice I figured it out. This is the final code I used:

mean1 <- mean(Meaningfulness[df$age<=22],na.rm=TRUE)
mean2 <- mean(Meaningfulness[df$age>=22],na.rm=TRUE)

sd1 <- sd(Meaningfulness[df$age<=22],na.rm=TRUE)
sd2 <- sd(Meaningfulness[df$age>=22],na.rm=TRUE)

pooled <- sqrt((sd1^2 + sd2^2) / 2)

d <- (mean1-mean2)/(pooled)
Newby
  • 1
  • 1