2

Consider the Urn problem where a hypothetical urn contains a finite number of $m$ balls, $r$ of which are black and $m-r$ are white. We take a random sample without replacement of $n$ balls and observe $k$ are black. The distribution of $K$ follows the hypergeometric distribution with mass $$P(K=k) = \frac{{n \choose k}{m-r \choose n-k}}{{m \choose n}}$$

I am interested in the "best" integer approximation of $r$ given known $m,n$ and observed $k$.

It can be shown that $$\hat{r} = \frac{m}{n}{k}$$ is an unbiased, consistent estimator of $r$. However, it can take non-integer values and is thus an "implausible" point estimate of $r$ for a finite population. I weary of simply rounding $\hat{r}$ to the nearest integer value given the asymmetry of the confidence intervals are such an estimate.

Is there a recommended method for estimating $$\tilde{r}\approx r,\qquad \tilde{r}\in\mathbb{Z}^+$$ Is it reasonable to choose $\tilde{r}\in\{\lfloor \hat{r}\rfloor,\lceil \hat{r} \rceil\}$ such that $$P(K=k|m=m,n=n,r=\tilde{r})$$is maximized?

Richard Border
  • 1,128
  • 9
  • 26

1 Answers1

3

Basically

$$ P(K=k|m=m,n=n,r=\tilde{r}) $$

is the likelihood function, so you could as well conduct a grid search among any integers within the valid values for $r$ to maximize the likelihood. If you start with the rounded values returned by the estimator, you are just bounding the search.

However if you look at the A Note About Maximum Likelihood Estimator in Hypergeometric Distribution paper by Hanwen Zhang, you'll see that $\lfloor \frac{m}{n}k\rfloor$ is the MLE estimator, so the procedure is not needed for non-integers.

Tim
  • 108,699
  • 20
  • 212
  • 390