1

I a experimenting with a new MCMC software and before I delve into more complicated models I wanted to run some simple simulations. This is a very very simple simulation, so not meant to be very tricky. I am just generating some random data, and then seeing if the software can actually recover those parameters with its MCMC sampler, or conjugacy, etc.

As I was looking over very simple demos, I noticed that I could not find a wide variety of very simple demos. I could find an example of a coin flip, so the data is distributed as a Bernoulli variable that has a Beta prior. Easy enough.

The next level of sophistication would be looking at a binomial random variable, and I could not find what the suitable priors are for this model. That is my question, what are the prior distributions for a Binomial model with unknown $n, p$. Here is the setup. I have some data which is just the number of successes in $n$ trials. For my demo I had 10 trials and a probability of success of $p=0.2$. Now, I generated 100 draws from this distribution, which looks like a vector $[4, 9, 5, 7, 1, 2 ...]$.

My problem was that I could not figure out how to parameterize the prior for $n$ in this model.

$$ n \sim ? \\ p \sim Beta(1, 1)\\ data_i \sim Binomial(n, p) $$

I checked both the McElreath and Gelman et al. books but did not find a way to reason about what priors to use for $n$. I know that $n$ has to be an integer, and that the sum of $a,b$ for the Beta distribution should equal $n$. I can't use a continuous distribution, otherwise I get an error when trying to compute Binomial(n,p).

The closest I figured out was using Binomial(floor(n), p) and then using a continuous distribution over $n$. This does work but I get a lot of numerical warnings from the sampler :). I actually obtained pretty reasonable estimates for the parameters using NUTS(0.65).

I imagine that there should be a better theory for such a simple problem. Does anyone know a better parameterization?

krishnab
  • 1,008
  • 7
  • 12
  • 1
    The reason You can't find a parametrization for n is because n is not a random variable. When you change n, you are changing to a completely new distribution. In other words, n must be set to a constant integer. If both n and p are unknown, you can't converge to a solution, it's basically a giant meaningless equation at that point. – Tanner Phillips Jun 05 '21 at 18:17
  • @TannerPhillips yeah, that makes sense. I mean I could pick the maximum value of the data array and set that as $n$. I was just wondering what the best way to do this was, if I did not know $n$. Like if I defined $n$ to be a random variable, is there some more analytic approach to solving it. Haha, it is funny because most of the textbooks show these very simple cases as the examples for using conjugacy, and then the more complicated models are the examples for MCMC. But then you have issues like this one. – krishnab Jun 05 '21 at 18:22
  • 1
    Possibly relevant: https://www.researchgate.net/publication/264708609_Maximum_Entropy_Reconstruction_for_Discrete_Distributions_with_Unbounded_Support – user3716267 Jun 05 '21 at 18:24
  • @user3716267 thanks for the suggestion. I just started to look at it and it seems like an interesting paper. – krishnab Jun 05 '21 at 18:28
  • Jeffreys' prior on $n$ is $\pi(n)\propto 1/n$ and [our textbook](http://amzn.to/2kxP1vO) covers this setting (in the Capture-Recapture chapter). – Xi'an Jun 05 '21 at 19:35
  • @Xi'an Oh yes, that makes a lot of sense. I see the logic there, so for each integer, its probability proportional to $1/n$. I will take a look at your book, I think I have it. Thanks for all of your help again. – krishnab Jun 05 '21 at 20:11
  • 1
    Qs with relevant answers, from which you can write your own: https://stats.stackexchange.com/questions/123367/estimating-parameters-for-a-binomial/123748#123748, https://stats.stackexchange.com/questions/502124/numbers-of-draws-on-a-modified-bernouilli-process/521043#521043 – kjetil b halvorsen Jun 05 '21 at 21:00

0 Answers0