Distribution of population size $n$ given binomial sampled quantity $k$ and selection probability $\pi$

Question

Given a drawn (without replacement) sample size $k$ from a binomial distribution with known probability parameter $\pi$, is there a function which gives distribution of likely population size $n$ from which these $k$ were sampled? For instance, let's say we have $k=315$ items randomly selected with known probability $\pi=0.34$ from a population of $n$ items. Here most likely value is $\hat{n}=926$ but what is probability distribution for $n$. Is there a distribution which gives $p(n)$?

I know that $p(\pi | k,n)$ is given by the beta distribution and that $p(k |\pi, n)$ is the binomial distribution. I'm looking for that third creature, $p(n |\pi, k)$, properly normalized of course such that $\sum_{n=k}^{\infty} p(n)=1$

first "attempt" at this, given the normal approximation to binomial distribution is $p(k|\pi, n)=\mathcal{N}(k/\pi,k\pi(1-\pi))$, is that $p(n|\pi,k)\approx\mathcal{N}(k/\pi,k\pi(1-\pi))$?

Since the population appears to be finite, we need to know how the sample was taken: was it with or without replacement? — whuber, Oct 15 '20 at 20:59
@whuber, thank you for question. samples taken without replacement. — phdmba7of12, Oct 16 '20 at 12:45
Please add this new information as an edit to the post, not only as a comment! Not everybody reads comments ... Also, sampling without replacement from a finite population leads to hypergeometric, not binomial, distribution. See also https://stats.stackexchange.com/questions/123367/estimating-parameters-for-a-binomial/123748#123748 — kjetil b halvorsen, Oct 16 '20 at 16:06
@kjetilbhalvorsen added to text of original question ... will try to rewrite for greater clarity ... any thoughts on answer to question aside from this replacement vs not distinction ... i'd be open to an answer for either/or case actually — phdmba7of12, Oct 16 '20 at 17:59
$$p(n|\pi , k) = \frac{p(k|\pi, n) p(n|\pi)}{p(k|\pi)}$$ are you looking for a reasonable prior $p(n|\pi)$? (When $\pi$ is the parameter, instead of $n$, we often use a beta distribution) — Sextus Empiricus, Oct 22 '20 at 18:46

Tanner Phillips · Accepted Answer · 2020-10-26T19:16:41.450

Let's start with this equation.

$\sum_{n=k}^{\infty}{{n \choose k}\pi^k(1-\pi)^{n-k}} $

Hopefully this is self-explanatory, but as just an intuition you can see this as a brute-force way of calculating your distribution by finding the probability that n pulls came from each possible binomial function. You would then need to divide by some constant $C$ such that: $\sum_{n=k}^{\infty}{\frac{{n \choose k}\pi^k(1-\pi)^{n-k}}{C}} = 1 $ to get a PMF. So if we can figure out what C is, we've got our distribution (even if we don't know the (EV or Varience).

The internet (wolframalpha) seems to be suggesting to me that C is equal to $1/\pi$. Feel free to confirm, but I'm sure that's correct. With that constant, the model simplifies to:

${n \choose k}\pi^{k+1}(1-\pi)^{n-k} $

I ran a simulation with this equation where we have observed n=1 with a probability of .5.

> pmf<-function(n,k,pi){
+   choose(n,k)*pi^(k+1)*(1-pi)^(n-k)
+ }
> 
> 
> graph<-1:100
> for(i in 1:100){
+ graph[i]<-pmf(i,1,.5)
+ }  
> 
> plot(graph)
> sum(graph)
[1] 1

Hopefully this distribution "rings true." 25% of the time we observe 1 success with 50% chance of success, it was after n=1, 25% it was n=2, and then it trails of exponentially from then. The expected value of our distribution is given by:

$\sum_{n=k}^{\infty}{n{n \choose k}\pi^k(1-\pi)^{n-k}} $

and variance:

$ E((X-E(n)^{2}) $

Unfortunately I don't currently have the time to solve those, but I challenge someone here to do so.

Edit: others have suggested a Bayesian solution for this problem. My bone to pick with those is that they assume that n HAS a distribution. Your question seems to assume that N is only distributed in as far as it is dependent on the binomial distribution(s) of $\pi$.

All the code I used is there, I just ran it for the situation where there was 1 observed event with a known probability pi of .5. It sums to one. — Tanner Phillips, Oct 22 '20 at 18:52
Right, this is R. I pasted directly from the console, so all of the ">" and "+" a the beggining of the lines aren't code, they are just an artifact of the code being run. — Tanner Phillips, Oct 22 '20 at 19:03

Bridgeburners · Answer 2 · 2020-10-22T20:06:41.607

Bayes' theorem tells us:

$$ p(n \mid k, \pi) = \frac{p(k \mid n, \pi) p(n \mid \pi)}{\sum_{m=k}^\infty p(k \mid m, \pi) p(m \mid \pi)} $$

We know $p(k \mid n, \pi)$. That's the Binomial distribution. However, we don't know the form of the prior, $p(n \mid \pi).$ But any possible prior for $n$ can give you an answer to the question. Therefore this question is underspecified. You would get a different answer if $n$ was Poisson distributed (with different parameters), or Negative Binomial distributed, or even distributed from a different Binomial distribution. But regardless, if you knew the distribution you would simply calculate the above expression.

As to your comment that we have $p(\pi \mid k,n)$ in the form of the Beta distribution, that has a similar issue to $n$. That is, we require a prior over $\pi$ to determine a posterior distribution for $\pi.$ For example, a uniform distribution prior gives a posterior of $\text{Beta}(k+1, n-k+1)$ with an expected value $(k+1)/(n+2)$. But a frequentist would estimate $\pi$ as having an EV of $k/n.$ Neither of them are "wrong", it just depends on your assumptions.

$k$ is different from $n$ or $\pi$ in that knowing the other two quantities gives you the information you need to know the distribution over $k$ exactly. The same is not true for either $n$ or $\pi.$

If you want to try an example for an assumed prior of $n$, let's suppose $n$ is Poisson distributed with mean $\lambda.$ Just keep in mind that this is an example, but the above answer still holds; that there's no definite answer to your question, it depends on the prior distribution of $n$.

If $n$ is distributed as $\text{Poisson}(\lambda)$ then the evidence function (the denominator) is,

$$ \begin{split} p(k) &= \sum_{m=k}^\infty {m \choose k} \pi^k (1-\pi)^{m-k} \frac{\lambda^m e^{-\lambda}}{m!} & \\ &= \frac{ \left( \pi \lambda \right)^k } {k!} e^{- \lambda} \sum_{m=k}^\infty \frac{ \left[ \lambda (1-\pi) \right]^{m-k} } {(m-k)!} \\ &= \frac{ \left( \pi \lambda \right)^k } {k!} e^{-\lambda} e^{\lambda (1-\pi)}\\ &= \frac{ \left( \pi \lambda \right)^k } {k!} e^{-\lambda \pi}. \end{split} $$

It's interesting to note that the evidence function, i.e. $p(k)$ is the Poisson distribution with mean $\lambda \pi.$ This seems intuitively obvious when you think about it.

The numerator would simply be the summand of the first line of the above expression, with $n$ replacing $m$. Taking that and dividing the evidence, we get,

$$ p(n \mid k, \pi) = \frac{ \left[ \lambda (1-\pi) \right]^{n-k} } {(n-k)!} e^{-\lambda (1-\pi)}. $$ Thus $n$ would be distributed such that $n-k$ is Poisson distributed with mean $\lambda (1 - \pi).$

Distribution of population size $n$ given binomial sampled quantity $k$ and selection probability $\pi$

2 Answers2