14

I've just started building models in stan; to build familiarity with the tool, I'm working through some of the exercises in Bayesian Data Analysis (2nd ed.). The Waterbuck exercise supposes that the data $n \sim \text{binomial}(N, \theta)$, with $(N, \theta)$ unknown. Since Hamiltonian Monte Carlo doesn't permit discrete parameters, I've declared $N$ as a real $\in [72, \infty)$ and coded a real-valued binomial distribution using the lbeta function.

A histogram of the results looks virtually identical to what I found by computing the posterior density directly. However, I'm concerned that there may be some subtle reasons that I should not trust these results in general; since the real-valued inference on $N$ assigns positive probability to non-integer values, we know that these values are impossible, as fractional waterbuck don't exist in reality. On the other hand, the results appear to be fine, so the simplification would appear to have no effect on inference in this case.

Are there any guiding principles or rules of thumb for modeling in this way, or is this method of "promoting" a discrete parameter to a real bad practice?

Sycorax
  • 76,417
  • 20
  • 189
  • 313
  • 3
    Actually, it's done all the time, when the value of the discrete parameter is "large" and the spread of the reasonable values it could take on is also "large" (but perhaps a different "large", "large" not being well-defined.) You more commonly see this when approximating discrete variables ("fraction of people who will vote for candidate X", which is drawn from a finite set) with continuous variables. It seems to me that with $N \geq 72$ you are likely well within the range for which a continuous approximation is fine, unless $N\theta$ is close to 0 or $N$. – jbowman Sep 19 '13 at 17:51
  • Great, that totally makes sense. It sounds like essentially the same caveats are in order as in the case of a z-test of proportions for $\hat \theta$ near 0 or 1. – Sycorax Sep 19 '13 at 17:59

1 Answers1

19

Firstly, feel free to ask questions like this on our users' list (http://mc-stan.org/mailing-lists.html) where we discuss not only issues related to Stan implementations/optimizations/etc but also practical statistical and modeling questions.

As to your question, it's absolutely a fine approach. There are many ways to justify it more rigorously (for example, looking at the divergence between the discrete CDF and its continuous approximation) but basically so long as your variance is larger than a few times unity then the missing discretization won't really have any effect on subsequent inferences.

This kind of approximation is ubiquitous, a common example being the approximation of a multinomial distribution as a product of independent Poisson distributions which are then approximated as Gaussian distributions.

  • 11
    That moment when, a year later, you realize that **the** Michael Betancourt posted an answer to your question... – Sycorax Sep 03 '14 at 23:09