Variance of sample proportion decreases with n but of a count increases with n - why?

Question

I've got an intuitive block with this. For a binomial problem, the standard deviation of a count is $\sqrt{np(1-p)}$. Conversely, the standard deviation of the sample proportion decreases with increasing $n$ and is $\sqrt{\frac{p(1-p)}{n}}$. I can do the division by $n$ but I don't have a feel why standard deviations move in opposite directions.

Two things: (a) proportion = $\frac{1}{n}$.count $\,\,$ and (b) $\text{sd}(cX) = c.\text{sd}(X)$. Clearly $c = \frac{1}{n}$ here, and $\frac{1}{n}\sqrt n = \frac{1}{\sqrt n}$. — Glen_b, Feb 07 '14 at 23:42
Yes, this is the issue - I can see the math and do the division by n but it's the intuitive aspect that is weird. If asked how to get a more precise estimate for a parameter I'd say take a larger sample. This gives me a better estimate for the proportion (OK) but a wider spread for counts and the more counts I add, weaker the conclusion I can draw. — user39707, Feb 08 '14 at 08:03
When you work with counts, what population quantity are you calculating a standard deviation/interval for? — Glen_b, Feb 08 '14 at 08:10
An example (Helsinki Heart Study) from a book (Moore & Mccabe) is where I am coming unstuck. Probability(heart attack)=0.04 & n=2000. SD for expected number of heart attacks works out as 8.76. Fine. There were 84 heart attacks in placebo group and 56 in treated group. Z=3.19 & unlikely by chance. If there were 10,000 in the trial, SD(counts) would be ~20 and difference in 2 groups no longer significant But how can more data give me less discrimination? — user39707, Feb 08 '14 at 10:24
Are the two groups of equal size? Does the number of heart attacks stay the same when the sample increases.? — dimitriy, Feb 08 '14 at 19:21
thank you - I'd neglected that! You have put your finger on it - the counts will increase linearly while the SD will increase only as the square root. Just as it should do. — user39707, Feb 08 '14 at 23:20

dimitriy · Accepted Answer · 2014-02-08T01:13:50.303

Very roughly, imagine that we are tossing a fair coin. Success is defined as heads. If we toss the coin once $(n=1)$, you will count either $1$ success or $0$ successes. Both have a equal positive probability of happening $(1/2)$. Now imagine we toss the coin $10$ times ($n=10$). Now you can get still get $0$ and $1$ successes (though both are less likely), but you can also get $2$ through $10$ (which are more likely). If variance measures how far a set of numbers is spread out, you can see with $10$ tosses the spread is wider than with $1$ toss or trial. This explains why the variance of the number of successes increases with $n$.

With the proportion (number of successes divided by number of tosses), you are trying to approximate the true value of $p$. As you get more information with more trials, your uncertainty about $p$ goes down, and so that variance shrinks. With one toss that comes up heads, you don't know very much (only that $p \ne 0)$. With $10$ tosses that all turn out to be heads, you're pretty sure that $p$ is near one.

I went back to the textbook and looks like I still don't quite get it I'm afraid. The comment I made above about the Helsinki Heart study sums up where it seems a little paradoxical to me right now — user39707, Feb 08 '14 at 16:17

Underminer · Answer 2 · 2020-04-17T13:44:29.737

Lets start by assuming the binomial distribution standard deviation is correct (it is). This is the standard deviation of the distribution of the number of successes out of $n$ trials given constant probability of success $p$. Call the number of successes, $X$.

So $Var(X) = np(1-p)$, which is what you have (standard deviation squared).

Since a proportion is the number of successes over the number of trials, we have:

$Var(\frac{X}{n}) = \frac{Var(X)}{n^2} = \frac{np(1-p)}{n^2} = \frac{p(1-p)}{n}$.

And thus standard deviation is of course $\sqrt{\frac{p(1-p)}{n}}$.

In one case you are looking at counts, in the other you are looking at counts divided by sample size.

Intuitively, you can imagine the counts of the number of successes are much higher ($X = 0, 1, 2, \ldots, n$) than a proportion ($0 \leq p \leq 1$). As $n$ increases, $X$ can take many different (and larger) integer values and has more variability; $p$, on the other hand, is restricted between 0 and 1. So $X$ has more variability.

how did you get $Var(\frac{X}{n}) = \frac{Var(X)}{n^2}$? Why is the denominator $n^2$? — user490895, Apr 17 '20 at 00:12
$Var(X)=E(X^2)-[E(X)]^2$ so $Var(cX) = E(c^2X^2)-[cE(X)]^2$ $=c^2E(X^2)-c^2E(X)^2$ $=c^2(E(X^2)-[E(X)]^2)$ $=c^2Var(X)$. Here, $c = 1/n$. I did make a typo in the answer's third equality that I will fix now. — Underminer, Apr 17 '20 at 13:41

score 0 · Answer 3 · answered May 01 '17 at 19:48

Okay! Ill make it very easy.

When using the std and variance USUALLY you are looking backwards, trying to see what is going on and then projecting the future. as you look backwards, the more trials usually helps get MORE info. More and more trials help narrow down what happened. and you now rotate better around the mean. Std and var just rotate around the mean so you get closer and closer to what will happen.

Binomial is different! we already know whats up, we know the probability. so looking backwards isnt as useful because, well, we already know the probability. More and more trials doesnt help us understand better and better how things rotate around the mean, it just gives us wider and wider distribution. increasing the trials really only gives more room for variance.

Imagine two scenarios: one you want to know how tall everyone is in a room. more measurements = closer to what the real average height is in the room, you are thankful for every new measurement.

second you have a coin. you already know what the average is. its 50/50 i mean at that point you are done. so lets pretend you start flipping, well every new flip is only more room for error. you flip 10 times and you get all 10 heads, you say to your friend, what the heck! where were the odds of that, thats so dumb! well if you only flipped it once you would have only had one chance for some crazy outliers. more flips dont really give you more info they just give more room for crazy results.

0 math and 0 formulas, hope that helps.

score 0 · Answer 4 · answered Apr 17 '20 at 14:11

If you're looking for some intuition on this result, ask yourself which of the following things is more variable:

... the proportion of females in a household, or the proportion of females in a whole country?
... the number of females in a household, or the number of females in a whole country?

Variance of sample proportion decreases with n but of a count increases with n - why?

4 Answers4

Linked