Please, prove that if we have two variables (equal sample size) $X$ and $Y$ and the variance in $X$ is greater than in $Y$, then the sum of squared differences (i.e., squared Euclidean distances) between data points within $X$ is also greater than that within $Y$.
-
1Please clarify: When you say *variance*, do you mean *sample variance*? When you say *sum of squared differences* do you mean $\sum_{i,j} (x_i - x_j)^2$? – cardinal Dec 21 '11 at 14:30
-
9Assuming the foregoing: $$ \sum_{i,j} (x_i - x_j)^2 = \sum_{i \neq j} ((x_i - \bar{x}) - (x_j - \bar{x}))^2 = 2 n \sum_{i=1}^n (x_i - \bar{x})^2 \> , $$ by carefully accounting for elements in the cross term. I imagine you can fill in the (small gaps). The result then follows trivially. – cardinal Dec 21 '11 at 14:50
-
For a more extensive discussion of this relationship and its applications, visit http://en.wikipedia.org/wiki/Variogram#Empirical_variogram. – whuber Dec 21 '11 at 16:39
-
2There is also a way to do this "without" any computation by considering the fact that if $X_1$ and $X_2$ are iid from $F$ (with a well-defined variance), then $\mathbb E (X_1 - X_2)^2 = 2 \mathrm{Var}(X_1)$. It requires a slightly firmer grasp on probability concepts, though. – cardinal Dec 21 '11 at 17:33
-
1For a related question, I used a visualization of what's going on here in a reply at http://stats.stackexchange.com/a/18200: the squared differences are areas of squares. – whuber Dec 21 '11 at 17:47
-
1@whuber: Very nice. Somehow I had missed this answer of yours along the way. – cardinal Dec 21 '11 at 17:53
-
@cardinal why is the foregoing true? i fail to understand why $\sum_{i \neq j} ((x_i - \bar{x}) - (x_j - \bar{x}))^2 = 2n \sum_{i=1}^n (x_i - \bar{x})^2$ – yupbank Sep 21 '21 at 21:15
1 Answers
Just to provide an "official" answer, to supplement the solutions sketched in the comments, notice
None of $\operatorname{Var} ((X_i))$, $\operatorname{Var} ((Y_i))$, $\sum_{i,j}(X_i-X_j)^2$, or $\sum_{i,j} (Y_i-Y_j)^2$ are changed by shifting all $X_i$ uniformly to $X_i-\mu$ for some constant $\mu$ or shifting all $Y_i$ to $Y_i-\nu$ for some constant $\nu$. Thus we may assume such shifts have been performed to make $\sum X_i = \sum Y_i = 0$, whence $\operatorname{Var}((X_i)) = \sum X_i^2$ and $\operatorname{Var}((Y_i)) = \sum Y_i^2$.
After clearing common factors from each side and using (1), the question asks to show that $\sum X_i^2 \ge \sum Y_i^2$ implies $\sum_{i,j} (X_i-X_j)^2 \ge \sum_{i,j} (Y_i-Y_j)^2$.
Simple expansion of the squares and rearranging the sums give $$\sum_{i,j}(X_i-X_j)^2 = 2\sum X_i^2 - 2\left(\sum X_i\right)\left(\sum X_j\right) = 2\sum X_i^2 = 2\operatorname{Var}((X_i))$$ with a similar result for the $Y$'s.
The proof is immediate.

- 281,159
- 54
- 637
- 1,101
-
why is $2(\sum X_i)(\sum X_j) = 0$ ? in the 3 point, rearranging the sums – yupbank Sep 21 '21 at 21:17
-
@yupbank Please read *all* of this answer, especially the part beginning "we may assume such shifts have been performed." Then substitute the values: $2\left(\sum X_i\right)\left(\sum X_j\right) = 2(0)(0)=0.$ – whuber Sep 21 '21 at 21:19
-
1Ah... i see, assuming a zero mean transformation make sense, sorry about that – yupbank Sep 21 '21 at 21:22