6

Suppose that we have a set of points on a line. The amount of dispersion can be measured by standard deviation in this case.

My question is, is there something similar for higher dimensions? For example, if there are 100 points ($i, j:\ \ 0 \le i \le 9,\ 0 \le j \le 9$), then we want to say that the "area" formed by these points is roughly 100.

I have two ideas, but both are not very good.

  1. The area or volume of the convex hull. However, this value can be affected too much by outliers.

  2. The product of diversion of x-coordinates and that of y-coordinates. However, imagine points on a diagonal line: we want small value for this case but the product will be big.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
guest
  • 61
  • 1
  • 2

1 Answers1

2

In situations like this, people often use the variance-covariance matrix. Along the main diagonal, the variance for each dimension is listed. Each $i, j$th off diagonal element (where $i\ne j$) lists the covariance of variables $i$ and $j$. In this way, every aspect of the dispersion is listed separately.

On the other hand, if you need a single number for simple comparisons, the determinant is sometimes used.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • 1
    Also related, [2D analog of standard deviation?](http://stats.stackexchange.com/a/13274/1036) - if you want to assume the distribution is isotropic around a centroid (which would result in small covariances off the diagonal in your suggestion). – Andy W Jan 04 '16 at 13:07
  • Wouldn't the largest eigenvalue of covariance matrix be better choice that the determinant? Determinant can become zero if the points lie on the same line. – fdermishin Oct 20 '18 at 10:23
  • That could be used also. It isn't very common to have perfectly collinear data, so it isn't a contingency that I'd typically be worried about. – gung - Reinstate Monica Oct 21 '18 at 17:07