1

Let's say I have a sample of one million observations that each get a score of 0-1. I would like to use the average score in the sample as an estimate for its true value in the population.

To get the 95% confidence interval, I would use a formula like:

enter image description here

Where the sample standard deviation (sigma) is equal to:

enter image description here

Isn't it strange that the confidence interval is the same regardless of whether the true population size is 1,000,001 or one billion? My gut would say we should a wider bandwidth for the 95% confidence interval if we know the population is one billion instead of 1,000,001. But the formulas above don't account for this.

Is there a solution to this paradox?

zthomas.nc
  • 737
  • 2
  • 6
  • 21
  • 3
    The implicit assumption here is that the processes can be sampled from indefinitely, and so we apply the CLT to get a confidence interval for the mean. There exist finite sample corrections for the standard error, but generally modelling the problem as hypergeometric rather than binomial should solve this issue. See [here](https://stats.stackexchange.com/questions/5158/explanation-of-finite-correction-factor) for more. – Demetri Pananos May 07 '21 at 23:57

1 Answers1

4

The formula you have given is for estimation of a mean parameter for an infinite population, when the variance is known. That is the wrong formula for what you want here. The formula for the confidence interval for the mean of a finite population (see e.g., O'Neill 2014, p. 286) is given by:

$$\text{CI}(1-\alpha) = \Bigg[ \bar{x}_n \pm \underbrace{\sqrt{\frac{N-n}{N}}}_\text{FPC term} \cdot \frac{t_{n-1, \alpha/2}}{\sqrt{n}} \cdot s_n \Bigg],$$

where $n$ is the sample size and $N$ is the population size. The FPC term shown in the formula is the "finite population correction" term, which adjusts the interval to take account of the population size. For inference about the mean parameter for an infinite population you have $N \rightarrow \infty$ and this term approaches one (and so it can be removed as a multiplicative term in the formula).

Ben
  • 91,027
  • 3
  • 150
  • 376