2

I am looking at problem 5.01 relating to Galton's data on adult heights of fathers and sons, from Chapter 5 of Statistical Modelling: A Fresh Approach

link to this book: http://www.mosaic-web.org/go/StatisticalModeling/index.html

Can anyone explain me why Confidence interval is so different that coverage interval?

Prob 5.01. The mean of the adult children in Galton’s data is

> mean( height, data=Galton ) 
[1] 66.76069

Had Galton selected a different sample of kids, he would likely have gotten a slightly different result. The confidence interval indicates a likely range of possible results around the actual result.

Use bootstrapping to calculate the 95% confidence interval on the mean height of the adult children in Galton’s data. The following statement will generate 500 bootstrapping trials.

trials = do(500) * mean(height, data=resample(Galton) ) 

(a) What is the 95% confidence Interval on the mean height?

A. 66.5 to 67.0 inches.
B. 66.1 to 67.3 inches.
C. 61.3 to 72 inches.
D. 65.3 to 66.9 inches.

(b) A 95% coverage interval on the individual children’s height can be calculated like this:

qdata(c(0.025,0.975), height, data=Galton) 
2.5% 97.5% 
 60   73

Q: Explain why the 95% coverage interval of individual children’s heights is so different from the 95% confidence interval on the mean height of all children?

Glen_b
  • 257,508
  • 32
  • 553
  • 939
Fawad
  • 23
  • 4
  • 2
    "Chapter 5" of what? – Glen_b Mar 18 '14 at 01:50
  • ok if that helps, chapter 5 of statistical modeling book – Fawad Mar 18 '14 at 01:53
  • Could you *name* the "statistical modeling book" in question? – gung - Reinstate Monica Mar 18 '14 at 02:10
  • Yes this is the book Statistical Modeling: A Fresh approach by Daniel Kaplan Second Edition. Link to this book http://www.mosaic-web.org/go/StatisticalModeling/index.html – Fawad Mar 18 '14 at 02:13
  • 1
    Thanks Fawad. I would seem rather odd to choose to explicitly mention the chapter number of an unnamed book, as if we would understand the relevance of the chapter reference. One or two of us are clever, but I don't think even whuber and cardinal can read minds. – Glen_b Mar 18 '14 at 02:21
  • @Glen_b I see, I actually thought the I gave you the question and answer and the only thing im asking is to explain why there is so much difference in the answer so not mentioning the book name didn't seem imp to me. Sorry about that. – Fawad Mar 18 '14 at 02:23
  • But if the book *isn't* important, surely it would make *no sense at all* to include the Chapter and problem number *in the title*. The fact that you chose to do so in the very *title* implies they are important. Which then leads us to wonder *why* those matter so much, which raises the question of the book title. Try to understand how your question reads to someone who *doesn't* know what's in your mind. – Glen_b Mar 18 '14 at 02:25
  • @Glen_b totally makes sense. Why the hell I even mention the chapter name, if im not mentioning the book name. So stupid of me -_-. Sorry, I guess I'm tired. – Fawad Mar 18 '14 at 02:26
  • I'm a bit slow; I've only just realized (because I edited your question to make it readable, and once I was done, it was obvious) that your initial question is actually just a slightly rephrased version of the last part of the second sub-question out of the book that you quoted at the end -- straight bookwork. You should have included the `self-study` [tag](http://stats.stackexchange.com/tags/self-study/info), and followed the recommendations there (and I should have simply given hints). It's a bit late now, but please try to meet community expectations on these kinds of questions. – Glen_b Mar 18 '14 at 03:34
  • @Glen_b alright. not sure what is self-study tag. this was my first question here. thanks for clean-up. – Fawad Mar 19 '14 at 03:49
  • If you do it next time, there's no problem. (I should have edited before I answered, that's my fault.) – Glen_b Mar 19 '14 at 04:10

1 Answers1

2

I'll try to address this in fairly general terms, without restricting the ideas to bootstrapping.

The width of the confidence intervals are affected by the variability (/uncertainty) of the quantities they are intervals for (more variable => wider interval).

The question comes down to this: why is the distribution of sample means less variable than the distribution of individuals from which those means were drawn?

This is the important distinction because if the distribution of sample means wasn't less variable than the distribution of individuals, the confidence intervals wouldn't tend to be smaller.

Variances have some interesting properties. Two relevant ones are:

(i) $\text{Var}(kX) = k^2 \text{Var}(X)$

(ii) $\text{Var}(X+Y) = \text{Var}(X) + \text{Var}(Y) + 2 \text{Cov}(X,Y)$

When $X$ and $Y$ are independent (or even only uncorrelated), the covariance is zero, leading to:

$\text{Var}(X+Y) = \text{Var}(X) + \text{Var}(Y)\,$.

The two facts lead us to:

If observations are independent with variance $\sigma^2$, sample means have variance $\sigma^2/n$.

So the spread of sample means (as measured by say the standard deviation of the distribution of sample means) tends to get smaller with sample size -- in proportion to $1/\sqrt n$.

Here's samples of different sizes from a shared distribution (points plotted in grey), and their means (red "+" symbols). The shape of the distribution of individuals is a density marked in grey (the same for each sample size). The shape of the distribution of sample means is a density marked in red; as you see, the red "+" symbols look like each could have been plausibly sampled from the red distribution. As the sample size increases, sample means become less spread.

enter image description here

Since sample means are less spread than individual observations, confidence intervals for them will generally be narrower, and as sample sizes become large, will tend to be ever more narrow.

Glen_b
  • 257,508
  • 32
  • 553
  • 939