22

Please, prove that if we have two variables (equal sample size) $X$ and $Y$ and the variance in $X$ is greater than in $Y$, then the sum of squared differences (i.e., squared Euclidean distances) between data points within $X$ is also greater than that within $Y$.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
ttnphns
  • 51,648
  • 40
  • 253
  • 462
  • 1
    Please clarify: When you say *variance*, do you mean *sample variance*? When you say *sum of squared differences* do you mean $\sum_{i,j} (x_i - x_j)^2$? – cardinal Dec 21 '11 at 14:30
  • 9
    Assuming the foregoing: $$ \sum_{i,j} (x_i - x_j)^2 = \sum_{i \neq j} ((x_i - \bar{x}) - (x_j - \bar{x}))^2 = 2 n \sum_{i=1}^n (x_i - \bar{x})^2 \> , $$ by carefully accounting for elements in the cross term. I imagine you can fill in the (small gaps). The result then follows trivially. – cardinal Dec 21 '11 at 14:50
  • For a more extensive discussion of this relationship and its applications, visit http://en.wikipedia.org/wiki/Variogram#Empirical_variogram. – whuber Dec 21 '11 at 16:39
  • 2
    There is also a way to do this "without" any computation by considering the fact that if $X_1$ and $X_2$ are iid from $F$ (with a well-defined variance), then $\mathbb E (X_1 - X_2)^2 = 2 \mathrm{Var}(X_1)$. It requires a slightly firmer grasp on probability concepts, though. – cardinal Dec 21 '11 at 17:33
  • 1
    For a related question, I used a visualization of what's going on here in a reply at http://stats.stackexchange.com/a/18200: the squared differences are areas of squares. – whuber Dec 21 '11 at 17:47
  • 1
    @whuber: Very nice. Somehow I had missed this answer of yours along the way. – cardinal Dec 21 '11 at 17:53
  • @cardinal why is the foregoing true? i fail to understand why $\sum_{i \neq j} ((x_i - \bar{x}) - (x_j - \bar{x}))^2 = 2n \sum_{i=1}^n (x_i - \bar{x})^2$ – yupbank Sep 21 '21 at 21:15

1 Answers1

5

Just to provide an "official" answer, to supplement the solutions sketched in the comments, notice

  1. None of $\operatorname{Var} ((X_i))$, $\operatorname{Var} ((Y_i))$, $\sum_{i,j}(X_i-X_j)^2$, or $\sum_{i,j} (Y_i-Y_j)^2$ are changed by shifting all $X_i$ uniformly to $X_i-\mu$ for some constant $\mu$ or shifting all $Y_i$ to $Y_i-\nu$ for some constant $\nu$. Thus we may assume such shifts have been performed to make $\sum X_i = \sum Y_i = 0$, whence $\operatorname{Var}((X_i)) = \sum X_i^2$ and $\operatorname{Var}((Y_i)) = \sum Y_i^2$.

  2. After clearing common factors from each side and using (1), the question asks to show that $\sum X_i^2 \ge \sum Y_i^2$ implies $\sum_{i,j} (X_i-X_j)^2 \ge \sum_{i,j} (Y_i-Y_j)^2$.

  3. Simple expansion of the squares and rearranging the sums give $$\sum_{i,j}(X_i-X_j)^2 = 2\sum X_i^2 - 2\left(\sum X_i\right)\left(\sum X_j\right) = 2\sum X_i^2 = 2\operatorname{Var}((X_i))$$ with a similar result for the $Y$'s.

The proof is immediate.

whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • why is $2(\sum X_i)(\sum X_j) = 0$ ? in the 3 point, rearranging the sums – yupbank Sep 21 '21 at 21:17
  • @yupbank Please read *all* of this answer, especially the part beginning "we may assume such shifts have been performed." Then substitute the values: $2\left(\sum X_i\right)\left(\sum X_j\right) = 2(0)(0)=0.$ – whuber Sep 21 '21 at 21:19
  • 1
    Ah... i see, assuming a zero mean transformation make sense, sorry about that – yupbank Sep 21 '21 at 21:22