2

Can anyone tell me why the calculated birthday match probability is a slightly biased estimator when simulated? Taking a group of 30 people, theory tells us that the probability of at least 2 having the same birthday is 1-(364/365)^(30C2) = 69.68%.

Simulating it in R with the following code:

Nsim <- 1000
Nprop <- 100
Nper <- 30
match.birthday <- numeric(Nprop)
result <- numeric(Nsim)

for (j in 1:Nsim) {
  for (i in 1:Nprop) {
    birthdays <- sample(1:365, size=Nper, replace=T)
    match.birthday[i] <- length(unique(birthdays))
  }
  result[j] <- length(match.birthday[match.birthday < 30])/Nprop
}

hist(result)
SE <- sd(result)/sqrt(Nsim)
mean(result)
mean(result) + 2*SE
mean(result) - 2*SE

The mean is consistently about 70.5% or so with a SE of about 0.15%. Consistently when I run it the probability calculates at about 0.6-0.9% higher than the theoretical value. Anyone any idea why this is? Have I something wrong with my code?

Joe H
  • 66
  • 5
  • 1
    To what "theory" are you referring? The formula is not correct. This is obvious, because when applied to a sample of 366 it still gives an answer less than $1,$ although it is certain in such cases that two people will share a birthday. See https://stats.stackexchange.com/questions/22009/what-is-the-real-answer-to-the-birthday-question/22012#22012. – whuber Nov 07 '18 at 18:24
  • FYI there is a `pbirthday` function in R, which calculated a prob of 70.6% – Dave2e Nov 07 '18 at 18:25
  • @whuber - many thanks!! excuse my error but I was misled by this website https://betterexplained.com/articles/understanding-the-birthday-paradox/ and thought it had the theory right. I never thought of trying it with 366 persons. :-( Of course the answer is 70.63% and my simulation is bang on with the correct answer nearly always within +/- 2*SE of the estimate. – Joe H Nov 07 '18 at 20:46
  • 1
    My jaw dropped when I read that page, but it (partially) rescued itself at the section beginning "Remember how we assumed birthdays are independent? Well, they aren’t." Take a look at what follows. – whuber Nov 07 '18 at 20:49

1 Answers1

1

It was answered by whuber above. The theory I had was wrong and the theoretical value is 1-(365P30)/365^30 = 70.63%. The estimate through simulation is unbiased!!

Joe H
  • 66
  • 5
  • 2
    Please note that this is not an "estimate" in the usual statistical sense: it is purely a probability calculation. It requires no data. – whuber Nov 07 '18 at 20:53
  • 2
    @whuber the 'estimate' was a reference to the simulation not the calculation. The simulation does use simulated data to calculate the statistic. Is that not an estimate? – Joe H Nov 07 '18 at 20:59