4

I was looking up the Wikipedia page on Sampling distribution and the first paragraph makes the claim:

allow analytical considerations to be based on the sampling distribution of a statistic, rather than on the joint probability distribution of all the individual sample values

I read this over and over again and though I think I understand what it is saying I'd like to get a more rigorous statement and if possible notes on practical considerations.

Thanks in advance.

The full text of the first paragraph is:

In statistics, a sampling distribution or finite-sample distribution is the probability distribution of a given statistic based on a random sample. Sampling distributions are important in statistics because they provide a major simplification on the route to statistical inference. More specifically, they allow analytical considerations to be based on the sampling distribution of a statistic, rather than on the joint probability distribution of all the individual sample values.

chl
  • 50,972
  • 18
  • 205
  • 364
user1172468
  • 1,505
  • 5
  • 21
  • 36

3 Answers3

3

In parametric statistics, you usually start with a sample, let us say an iid sample, $X_1,\ldots,X_n$, distributed as $$ \prod_{i=1}^n f_\theta(x_i), $$ and you have to draw inference on $\theta$ using this distribution, which may be troublesome.

If, instead, for a reason or another, you decide to use only a specific transform of the sample, $\Psi(X_1,\ldots,X_n)$, for instance of the same dimension as $\theta$, and if this new random variable has a closed-form/analytic distribution, $$ \Psi(X_1,\ldots,X_n) \sim g_{n,\theta}(\psi) $$ then it is much easier to draw inference using this known distribution.

Of course, this is hidding under the carpet the fact that the transform $\Psi$ has to be chosen in the first place, so I am not so convinced of the relevance of this Wikipedia sentence!

Xi'an
  • 90,397
  • 9
  • 157
  • 575
2

Suppose that you want to know how many likely voters plan to vote for the incumbant in your cities race for mayor this year so you take a simple random sample of likely voters and ask them if they plan to vote for the incumbant or the challenger. The sampling distribution tells us the relationship between the proportion in our sample and the true proportion from the entire city. Because of the sampling distribution we can make inference based only on the information about the proportion who said "incumbant" in our sample and the sample size, inference like hypothesis tests or confidence intervals. If we used the joint distribution then we would have to use the information on how each individual answered the question instead of just the summary information (which is a lot simpler).

Greg Snow
  • 46,563
  • 2
  • 90
  • 159
2

It's because all of the information from the data given an assumed model is picked up by a multiple of the likelihood and that is all you need or many would argue should use in inference (tentatively taking the model as given). When that is just driven by some summary statistics, there is tremendous simplification.

This is perhaps most easily seen from the Bayesion perspective by noting that

posterior = prior * data model and so data model = posterior/prior.

(prior/posterior is actually a relative probability - after data/before the data - that is called the relative belief ratio and is a multiple of the likelihood function)

Approximate Bayesian Computation (ABC) could be a convenient way to visualize these things.

A technical paper for any who might be interested http://www.utstat.utoronto.ca/mikevans/papers/surprise.pdf

phaneron
  • 1,252
  • 7
  • 7
  • I do not see why ABC gets involved in this answer!I am not even sure you need to mention the Bayesian perspective. – Xi'an Oct 16 '12 at 16:43
  • @Xi'an: I am not claiming one needs to mention the Bayesian perspective, but rather that I find it transparent to do so. This gives a bit of an expansion on why I think that http://stats.stackexchange.com/questions/7455/the-connection-between-bayesian-statistics-and-generative-modeling/40524#40524 But as David Cox once said transparency is in the mind of the beholder. – phaneron Oct 16 '12 at 20:15
  • @Xi'an: This might be a better clarification. In the accepted answer above, the specific transform is accepted as being the sample mean (proportion) for targeting the population proportion. You chose to hide that under the carpet below, while I was pointing to ABC as a way to visualize its acceptability as the transform. That is, do a small toy example ABC, conditioning on the actual sequence of yes and no,s and then conditioning on just the proportion of yes,s. The posterior will be approximately the same and divided by the prior will approximate c * likelihood. – phaneron Oct 17 '12 at 14:37