6

Given two vectors $X$ and $Y$ (length $n$, sampled from random variables), what is the name of the following quantity:

$$ \frac{1}{n^2}\sum_{i=1}^n\sum_{j=1}^n(x_i-y_j)^2 $$

I 'came up' with the formula to quantify the variance between two vectors and I guess that the formula is either nonsense, or if not -- given its triviality -- a well-known quantity. I know it's not the covariance between $X$ and $Y$, but what is it instead?

Edit: obviously, in the context of predictions with e.g. data $X$ and prediction $Y$, this would correspond to the mean squared error (except for the normalisation constant which would be $\frac{1}{n}$). I wonder though whether it has a name (and a meaning) in the context of statistics. see comment by Stephan Kolassa

whuber
  • 281,159
  • 54
  • 637
  • 1,101
monade
  • 143
  • 9
  • 2
    No, this would not be the mean squared error, because for the MSE, you would have *paired* data (to each prediction there corresponds one actual), and you would take the MSE only *within* each pair. Here, you are combining *each* $x$ with *each* $y$. – Stephan Kolassa Oct 10 '20 at 11:44
  • Thanks, good point! – monade Oct 10 '20 at 11:46
  • 3
    This is an empirical version of $\mathbb E[(X-Y)^2]$, empirical in the sense of the empirical distribution of the pair $(X,Y)$ under an assumption of independence. – Xi'an Oct 10 '20 at 12:54
  • 1
    It looks like you really want to be considering some multiple of $$\sum_{i=1}^n\sum_{j=1}^n (x_i-x_j)(y_i-y_j).$$ I illustrate this and describe its interpretation at https://stats.stackexchange.com/a/18200/919. – whuber Oct 10 '20 at 14:33
  • It’s a L2 norm... – Mithridates the Great Oct 10 '20 at 20:34
  • @whuber: Thanks! To clarify: I'm not interested in the degree to which $X$ and $Y$ co-vary, but rather in the variance of the pooled data of $X$ and $Y$, only considering cross-pairs of the data. The original motivation for this came from a [different problem](https://stats.stackexchange.com/questions/491349/precision-of-parameter-fits-in-computational-models/491376) - I wondered how I could compute the variance between vectors $Y_1$ and $Y_2$, which correspond to the output variables of a model when run with two different values of a parameter. – monade Oct 11 '20 at 08:33

1 Answers1

5

This is a measure of squared dispersion between two sets of values but not between paired values. I doubt it has a name.

Indeed you do not need to have the same number of $x$ and $y$ values, and using the $\frac1n$ calculation of variance, you can say:

$$\frac{1}{mn}\sum_{i=1}^m\sum_{j=1}^n(x_i-y_j)^2\qquad \\\qquad\qquad= (\bar x - \bar y)^2 + \widehat{\text{Var}}(x) +\widehat{\text{Var}}(y)$$

so a combination of the squared distance between the centres of the two sets plus the square dispersions of the individual sets.

Xi'an
  • 90,397
  • 9
  • 157
  • 575
Henry
  • 30,848
  • 1
  • 63
  • 107
  • Thanks, this makes sense! I wonder, given that the variance of a variable $X$ is the mean squared difference between all $(x_i, x_j)$, would it not make sense to call this quantity something like the between-variable variance? – monade Oct 10 '20 at 13:09
  • 1
    @monade You might then want to divide it by $2$ since $\frac{1}{n^2}\sum\limits_{i=1}^n\sum\limits_{j=1}^n(x_i-x_j)^2= 2{\text{Var}}(x)$ – Henry Oct 10 '20 at 14:24