Statistical test for comparing total variance of multivariate data

Question

I have data (with ~50,000 data points) that consists of measurement of two variables.

I wanted to see how "spread out" the scatterplot of each sample is i.e. variance in all (2) dimensions. You can see that the scatter of Sample 2 is more broad. If I simply compare the univariate variances, then I may not see any difference between the two samples (as the data points are just projections in one dimension). To give you a context, these data points denote activity of a protein in two different tasks. Each dot would denote a combination of the two traits. I want to know what is the diversity of these combinations i.e. the area of "activity space" covered by different samples.

This post, A measure of "variance" from the covariance matrix?, suggests different metrics such as trace or the k^th root of the determinant of the covariance matrix. I was also considering using the determinant (product of eigenvalues) as it would somewhat represent the total area covered by the data.

If I do use $|\Sigma|^k$ what would be an appropriate statistical test to compare two samples (analogous to F-test)?

How do you intend to interpret or use your measure of "spread"? That ought to determine the answer. Anything else would just be abstract mathematics, which may be interesting but could be useless or misleading. — whuber, Jul 25 '19 at 12:36
@whuber the two variables denote two different "activities" of a protein which denote two different traits. There is some correlation between the two activities. The spread would tell me how diverse the traits are in a given population. Perhaps, I can add a picture to explain it properly. I'll do that. — WYSIWYG, Jul 25 '19 at 12:42
This context admits many possible solutions. For instance, after constructing a numerical measure of *difference* between the two traits within any individual, you could express the population diversity by means of any appropriate summary statistic of those differences, such as their standard deviation, variance, IQR, etc. That already gives you two very large families of choices (difference metric and summary statistic). Please, then, provide some information to select good options within those families. — whuber, Jul 25 '19 at 12:47
@whuber I edited. Perhaps it is clear now. In this case I just want to see how diverse the combination of traits are. — WYSIWYG, Jul 25 '19 at 13:02
"Diverse" has myriad meanings and a great many possibilities for quantitative expression. It would help for you to clarify what you mean by "diverse." — whuber, Jul 25 '19 at 15:34
@whuber I consider each point in the data as a possible combination of traits (activities). Therefore, the diversity, in this case would be the area of the activity space covered by each sample. — WYSIWYG, Jul 25 '19 at 16:23

Statistical test for comparing total variance of multivariate data

0 Answers0