What is the name of this quantity?

Question

Given two vectors $X$ and $Y$ (length $n$, sampled from random variables), what is the name of the following quantity:

$$ \frac{1}{n^2}\sum_{i=1}^n\sum_{j=1}^n(x_i-y_j)^2 $$

I 'came up' with the formula to quantify the variance between two vectors and I guess that the formula is either nonsense, or if not -- given its triviality -- a well-known quantity. I know it's not the covariance between $X$ and $Y$, but what is it instead?

Edit: obviously, in the context of predictions with e.g. data $X$ and prediction $Y$, this would correspond to the mean squared error (except for the normalisation constant which would be $\frac{1}{n}$). I wonder though whether it has a name (and a meaning) in the context of statistics. see comment by Stephan Kolassa

No, this would not be the mean squared error, because for the MSE, you would have *paired* data (to each prediction there corresponds one actual), and you would take the MSE only *within* each pair. Here, you are combining *each* $x$ with *each* $y$. — Stephan Kolassa, Oct 10 '20 at 11:44
This is an empirical version of $\mathbb E[(X-Y)^2]$, empirical in the sense of the empirical distribution of the pair $(X,Y)$ under an assumption of independence. — Xi'an, Oct 10 '20 at 12:54
It looks like you really want to be considering some multiple of $$\sum_{i=1}^n\sum_{j=1}^n (x_i-x_j)(y_i-y_j).$$ I illustrate this and describe its interpretation at https://stats.stackexchange.com/a/18200/919. — whuber, Oct 10 '20 at 14:33
@whuber: Thanks! To clarify: I'm not interested in the degree to which $X$ and $Y$ co-vary, but rather in the variance of the pooled data of $X$ and $Y$, only considering cross-pairs of the data. The original motivation for this came from a [different problem](https://stats.stackexchange.com/questions/491349/precision-of-parameter-fits-in-computational-models/491376) - I wondered how I could compute the variance between vectors $Y_1$ and $Y_2$, which correspond to the output variables of a model when run with two different values of a parameter. — monade, Oct 11 '20 at 08:33

score 5 · Accepted Answer · edited Oct 10 '20 at 12:52

5

This is a measure of squared dispersion between two sets of values but not between paired values. I doubt it has a name.

Indeed you do not need to have the same number of $x$ and $y$ values, and using the $\frac1n$ calculation of variance, you can say:

$$\frac{1}{mn}\sum_{i=1}^m\sum_{j=1}^n(x_i-y_j)^2\qquad \\\qquad\qquad= (\bar x - \bar y)^2 + \widehat{\text{Var}}(x) +\widehat{\text{Var}}(y)$$

so a combination of the squared distance between the centres of the two sets plus the square dispersions of the individual sets.

edited Oct 10 '20 at 12:52

Xi'an

90,397
9
157
575

answered Oct 10 '20 at 12:36

Henry

30,848
1
63
107

Thanks, this makes sense! I wonder, given that the variance of a variable $X$ is the mean squared difference between all $(x_i, x_j)$, would it not make sense to call this quantity something like the between-variable variance? – monade Oct 10 '20 at 13:09
1

@monade You might then want to divide it by $2$ since $\frac{1}{n^2}\sum\limits_{i=1}^n\sum\limits_{j=1}^n(x_i-x_j)^2= 2{\text{Var}}(x)$ – Henry Oct 10 '20 at 14:24

What is the name of this quantity?

1 Answers1