1

Let $a,b,c,d$ be independent normally distributed random variables. I'm aware that the following distributions:

$$c a + d b$$

$$(c+d)a$$

both have the same standard deviation ($=\sqrt{2}$ if $a,b,c,d$ have unit variance).

I'm interested in the statistics generated by the following process: Let $a,b$ represent vectors containing a large number of samples, while $c,d$ each represent a single sample. Then what is the expected standard deviation of the two distributions above? Using the MATLAB code below, I get mean standard deviations of approximately $1.25$ and $1.125$, respectively.

N=100;
for m=1:100000
    a = randn(N,1);
    b = randn(N,1);
    c = randn(1);
    d = randn(1);

    x(m)=std(c*a+d*b);
    y(m)=std((c+d)*a);
end

mean(x)
mean(y)

If possible, I'd like to characterize the distribution of standard deviations fully, but a way of deriving the mean would be helpful too.

My apologies if this question is too easy (I hope so!).

  • 1
    It is unclear what you are doing. What does it mean for a vector to "contain ... samples" and, when $a$, $b$, $c$, and $d$ are vectors *of different lengths,* what do their products mean? In any event, whatever you are doing looks like it might possibly be answered at http://stats.stackexchange.com/questions/51699, which shows how to compute moments of inner products of multinormally distributed variates. (Yes, I read the `R` code, but it's still unclear what you're trying to do, because it seems to be evaluating *standard deviations* of *components* of vectors, not variances at all.) – whuber Mar 15 '13 at 18:40
  • @whuber Thanks for the comments. I've rephrased the question in terms of standard deviation instead of variance. I agree that I don't know the appropriate vocabulary to describe my question, which is part of why I can't find the answer myself, and is also the reason that I've included code -- to make the meaning clear. The products you refer to are just scalar times vector, so I think the meaning is clear. Let me know if it's still not. – David Ketcheson Mar 15 '13 at 18:47
  • To be clear, there are no inner products here, so the link in your comment is unrelated. All the products are scalar x vector. – David Ketcheson Mar 15 '13 at 18:53
  • Thanks. But what distinction are you making between "standard deviation" and "expected standard deviation"? The code does not clarify that. BTW, there definitely *are* inner products here; you just have to see them. For instance, the sums and sums of squares involved in computing `std(c*a)` can be expressed as the inner product of `rep(c,100)` with `a` and the inner product of `rep(c^2,100)` with `a^2`; quite possibly the latter could be dispensed with by considering the square of the former. Under some interpretations of your question, the latter isn't needed at all. – whuber Mar 15 '13 at 18:57

1 Answers1

1

It looks like in the first case, if $N$ is large, the mean of the standard deviations should be about $\sqrt\frac{\pi}{2} \approx 1.25$ and in the second case it should be $\frac{2}{\sqrt{\pi}} \approx 1.128.$

This can be calculated using the sampling theory of the normal distribution. In the first case, $\mathrm{Var}(ca + db) = c^2 \mathrm{Var}(a) + d^2 \mathrm{Var}(b)$. Now, $1/(N-1)$ times the variance of a sample of size $N$ from the standard normal has a chi-square distribution with $N-1$ degrees of freedom, so $\mathrm{Var}(a)$ and $\mathrm{Var}(b)$ are $\chi^2_{N-1}/(N-1)$ random variables. In your case, $\chi^2_{99}/99$. If you are interested in $N \rightarrow \infty$, then the variance of the the chi-square random variable divided by $N-1$ will tend to zero and $\mathrm{Var}(a)$ and $\mathrm{Var}(b)$ will be approximately $1$. You are left with $$sd(ca+bd) \sim \sqrt{c^2+d^2}$$ where $c$ and $d$ are standard normals. This is a chi-distrbution with $2$ degrees of freedom. Wikipedia says that its mean is $\sqrt{2}\Gamma(3/2) = \sqrt{2}(1/2)\sqrt{\pi} = \sqrt{\frac{\pi}{2}}$, which is about $1.25$.

The second case is very similar. The same sort of argument gives an approximate distribution for $sd((c+d)a)$ $$sd((c+d)a) \sim \sqrt{(c+d)^2}$$ But $(c+d)$ is normal with standard deviation $\sqrt{2}$, so $\frac{(c+d)}{\sqrt{2}}$ is a standard normal. Therefore, $\frac{1}{\sqrt{2}}\sqrt{(c+d)^2}$ is a $\chi_1$ random variable and its mean is $\sqrt{2}\frac{1}{\Gamma(1/2)}$. Since $\Gamma(1/2) = \sqrt{\pi}$ it follows that the mean of $\sqrt{(c+d)^2}$ is $\frac{2}{\sqrt{\pi}}$ which is approximately $1.128$.


tl;dr If $N$ is large, the first is $\chi_2$ with mean $\sqrt\frac{\pi}{2}$, the second is $\sqrt{2}$ times $\chi_1$ with mean $\frac{2}{\sqrt{\pi}}$.

Flounderer
  • 9,575
  • 1
  • 32
  • 43