Can anyone tell me why the calculated birthday match probability is a slightly biased estimator when simulated? Taking a group of 30 people, theory tells us that the probability of at least 2 having the same birthday is 1-(364/365)^(30C2) = 69.68%.
Simulating it in R with the following code:
Nsim <- 1000
Nprop <- 100
Nper <- 30
match.birthday <- numeric(Nprop)
result <- numeric(Nsim)
for (j in 1:Nsim) {
for (i in 1:Nprop) {
birthdays <- sample(1:365, size=Nper, replace=T)
match.birthday[i] <- length(unique(birthdays))
}
result[j] <- length(match.birthday[match.birthday < 30])/Nprop
}
hist(result)
SE <- sd(result)/sqrt(Nsim)
mean(result)
mean(result) + 2*SE
mean(result) - 2*SE
The mean is consistently about 70.5% or so with a SE of about 0.15%. Consistently when I run it the probability calculates at about 0.6-0.9% higher than the theoretical value. Anyone any idea why this is? Have I something wrong with my code?