Link between variance and pairwise distances within a variable

Question

Please, prove that if we have two variables (equal sample size) $X$ and $Y$ and the variance in $X$ is greater than in $Y$, then the sum of squared differences (i.e., squared Euclidean distances) between data points within $X$ is also greater than that within $Y$.

Please clarify: When you say *variance*, do you mean *sample variance*? When you say *sum of squared differences* do you mean $\sum_{i,j} (x_i - x_j)^2$? — cardinal, Dec 21 '11 at 14:30
Assuming the foregoing: $$ \sum_{i,j} (x_i - x_j)^2 = \sum_{i \neq j} ((x_i - \bar{x}) - (x_j - \bar{x}))^2 = 2 n \sum_{i=1}^n (x_i - \bar{x})^2 \> , $$ by carefully accounting for elements in the cross term. I imagine you can fill in the (small gaps). The result then follows trivially. — cardinal, Dec 21 '11 at 14:50
For a more extensive discussion of this relationship and its applications, visit http://en.wikipedia.org/wiki/Variogram#Empirical_variogram. — whuber, Dec 21 '11 at 16:39
There is also a way to do this "without" any computation by considering the fact that if $X_1$ and $X_2$ are iid from $F$ (with a well-defined variance), then $\mathbb E (X_1 - X_2)^2 = 2 \mathrm{Var}(X_1)$. It requires a slightly firmer grasp on probability concepts, though. — cardinal, Dec 21 '11 at 17:33
For a related question, I used a visualization of what's going on here in a reply at http://stats.stackexchange.com/a/18200: the squared differences are areas of squares. — whuber, Dec 21 '11 at 17:47
@whuber: Very nice. Somehow I had missed this answer of yours along the way. — cardinal, Dec 21 '11 at 17:53
@cardinal why is the foregoing true? i fail to understand why $\sum_{i \neq j} ((x_i - \bar{x}) - (x_j - \bar{x}))^2 = 2n \sum_{i=1}^n (x_i - \bar{x})^2$ — yupbank, Sep 21 '21 at 21:15

score 5 · Accepted Answer · answered Mar 13 '16 at 22:08

Just to provide an "official" answer, to supplement the solutions sketched in the comments, notice

None of $\operatorname{Var} ((X_i))$, $\operatorname{Var} ((Y_i))$, $\sum_{i,j}(X_i-X_j)^2$, or $\sum_{i,j} (Y_i-Y_j)^2$ are changed by shifting all $X_i$ uniformly to $X_i-\mu$ for some constant $\mu$ or shifting all $Y_i$ to $Y_i-\nu$ for some constant $\nu$. Thus we may assume such shifts have been performed to make $\sum X_i = \sum Y_i = 0$, whence $\operatorname{Var}((X_i)) = \sum X_i^2$ and $\operatorname{Var}((Y_i)) = \sum Y_i^2$.
After clearing common factors from each side and using (1), the question asks to show that $\sum X_i^2 \ge \sum Y_i^2$ implies $\sum_{i,j} (X_i-X_j)^2 \ge \sum_{i,j} (Y_i-Y_j)^2$.
Simple expansion of the squares and rearranging the sums give $$\sum_{i,j}(X_i-X_j)^2 = 2\sum X_i^2 - 2\left(\sum X_i\right)\left(\sum X_j\right) = 2\sum X_i^2 = 2\operatorname{Var}((X_i))$$ with a similar result for the $Y$'s.

The proof is immediate.

why is $2(\sum X_i)(\sum X_j) = 0$ ? in the 3 point, rearranging the sums — yupbank, Sep 21 '21 at 21:17
@yupbank Please read *all* of this answer, especially the part beginning "we may assume such shifts have been performed." Then substitute the values: $2\left(\sum X_i\right)\left(\sum X_j\right) = 2(0)(0)=0.$ — whuber, Sep 21 '21 at 21:19
Ah... i see, assuming a zero mean transformation make sense, sorry about that — yupbank, Sep 21 '21 at 21:22

Link between variance and pairwise distances within a variable

1 Answers1

Linked