Covariance Matrix and it's dependence on the dataset size

Asked Apr 17 '21 at 15:53

Active Apr 17 '21 at 15:58

Viewed 24 times

I am new in the analysis of covariance matrix and plotting the corresponding ellipses. Here are my confusions in summary :

If I have a dataset of size: N rows.

I compute the covariance matrix, I get some values.

I break up the data set into half, N/2, N/2.

I compute the covariance matrices of each data set, then I add these two matrices (for understanding the two data sets are identical, and hence the covariance matrix too),

Cov1+Cov2=Cov_T

I see that the matrix becomes half to the original matrix produced from the dataset of size N, see the plot, to explain this.

Green corresponds to the covariance matrix from two times the data, red corresponds to two covariance matrices added where each data sets are half the original dataset's size.

So my question is, is this the normal behavior expected? Why if the whole data set is computed in a single covariance matrix, it produces different results compared to when the data set is divided into half and the corresponding matrices are added.

edited Apr 17 '21 at 15:58

asked Apr 17 '21 at 15:53

Ayan Mitra

What exactly does the plot show? How do you calculate the values that are shown? I’ve never seen covariance matrix plotted like this. What do the axis mean? – Tim Apr 17 '21 at 16:06
1

Adding the two matrices is not meaningful. See https://stats.stackexchange.com/questions/51622. – whuber Apr 17 '21 at 16:21
@whuber thanks it was helpful. – Ayan Mitra Apr 19 '21 at 20:04

Covariance Matrix and it's dependence on the dataset size

0 Answers0