1

Consider the following setup: Let $\Omega$ be a finite (but humongous) state space and $\pi:\Omega\to[0,1]$ be a probability mass function. It seems to me that when people want to "sample" in this setting, their main motivation is to estimate the expected value of a function $f:\Omega\to X$ on $\Omega$ under $\pi$, i.e.

$$\mathbb{E}_\pi(f)=\sum_{\omega\in\Omega}\pi(\omega)f(\omega).$$

This is the case for asymptotic counting, $p$-value estimation in Hypothesis testing, and many more (It seems to me the whole MCMC business is about just that). To approximate $\mathbb{E}_\pi(f)$, one tries to get as much samples $\omega_1,\ldots,\omega_n\in\Omega$ from $\pi$ as possible, because the law of large numbers ensures that $\frac{1}{n}\sum_{i=1}^nf(\omega_i)$ converges almost surely to $\mathbb{E}_\pi(f)$. If sampling from $\pi$ is hard, then one constructs a Markov chain $(X_t)_{t\in\mathbb{N}}$ on $\Omega$ whose stationary distribution is $\pi$ (for example with the Metropolis-Hastings algorithm).

In this scenario, the number of samples $n$ is supposed to be large!

Now, my question is the following: Are there situations or applications in statistics (different from approximate an expected value!) where only one single sample $x\in\Omega$ from the distribution $\pi$ is needed?

EDIT: Triggered by the discussion in the comments of this post, let me be more precise: Does anybody know of applications in statistics where the task is to finds only one single (say uniform) sample from the lattice points of a polytope, that is a set that looks like

$$ \Omega=\{u\in\mathbb{Z}^d: Au\le b\}? $$

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Tobias Windisch
  • 542
  • 4
  • 10
  • 1
    This an extremely open question and so it's hard to answer. As an attempt to answer, consider that a common MCMC sampler, the Metropolis-Hastings algorithm. At each step, it requires a proposal sample. This is often done by sampling from a single draw from a normal distribution. – Cliff AB Apr 17 '16 at 15:51
  • Small side note: probability *density* functions map to $[0, \infty)$, not $[0,1]$. Probability *mass* functions map to $[0,1]$. – Cliff AB Apr 17 '16 at 15:53
  • @CliffAB: Thanks for you answer. I have changed "probability density function" to "probability mass function". Transfering your MCMC example to the discrete setup, the proposal sample at each step comes from a state space, say $\Omega'$, over which one has "good control" (e.g. we can enumerate it). I'm particularily interested in cases where the state space is way to large to get enumerated. This is essential and hence I added this to my question. – Tobias Windisch Apr 17 '16 at 16:07
  • One example is described in a [thread on the Student t-test](http://stats.stackexchange.com/a/1836/919). I cannot tell whether this answers your question, though, until you clarify for us what you mean by "needed." Textbooks are full of examples of estimators based on just one sample! – whuber Apr 17 '16 at 16:57
  • @whuber: Thanks for your comment! May here is an attempt to make the "needed" more precise: I'm looking for a statistical problem (like estimating the expected value of some function) which can be "solved" by drawing one single sample according to a probability mass function from a huge sample space. I really would like to give you an example for what I mean, but stating an example would already answer my question. – Tobias Windisch Apr 17 '16 at 17:12
  • Almost *any* estimation procedure works with a single sample. That's why I'm looking for clarification. Additional clarification is needed concerning the nature of that sample, because it's perfectly fine to view a very large sample as being a *single* observation for vector-valued $f$. In fact, that's a standard theoretical viewpoint: *all* estimators are functions of just one sample! – whuber Apr 17 '16 at 17:17
  • 1
    @whuber: I see your point. Clearly the problem "draw $k$ samples from $\Omega$" can be rephrased as "draw one sample from $\Omega^k$". I don't know right now how to rule this out. Maybe by saying that the sample size is not part of the input of the problem? – Tobias Windisch Apr 17 '16 at 17:34
  • I'm unsure, because I still don't have a good grasp of where you're going with this question. Is there perhaps a specific problem or setting that has motivated it? – whuber Apr 18 '16 at 13:52
  • @whuber: I want to understand what statisticians typically do with their samples. It seems to me that their main motivation of sampling is to make use of the law of large numbers. Thus, they in fact want a lot of samples and use them all together to obtain their result. – Tobias Windisch Apr 18 '16 at 15:36
  • 1
    I can see how you might arrive at that impression, but it's not correct. Sampling is motivated by practical and economic constraints: either it's impossible (or meaningless) to conduct a census or it's prohibitively expensive. The amount of effort to expend on a sample can be optimized by balancing how well the sample results meet the objectives against their cost. This has *absolutely nothing* to do with laws of large numbers! In some cases, asymptotic theory provides some guidance concerning how to find a solution. Does that help you clarify your post? – whuber Apr 18 '16 at 16:50

0 Answers0