I have $K$ datasets, each with $N$ variables and $M$ samples (they are in fact EEG time series, but I discard time and treat them as $K$ iid multivariate samples) and assume they are coming from the same multivariate normal distribution.
I am interested in estimating the covariance matrix. Now it can be done in two ways:
- Concatenating the datasets together, and calculating the covariance matrix. Its sampling distribution would be Wishart, given the assumptions of multinormality of samples.
- Calculating the covariances separately for each dataset and averaging those matrices (with arithmetic mean) to form one total covariance.
The first method is straightforward and have well established properties, but the second is in most cases much more feasible in my environment.
From properties of variance of Wishart distribution $\Big($S.E. of element $C_{i\,j}$ of covariance matrix $C$ equals $\sqrt{\frac{C_{i\,j}^2 + C_{i\,i} C_{j\,j}}{M-1}}$ $\Big)$ and CLT (Central Limit Theorem) I can see, that both expected value and standard error of estimate of covariance matrix should agree for both methods.
But yet, (obviously) the methods don't generate numerically the same covariances.
- Is it really true, that both ways of estimating covariance matrix have the same standard error?
- Does Wishart distribution behavior can be approximated by normal distribution when sample size parameter goes to infinity (just like we do for chi squared distribution)? If so, what are the conditions to have reasonably good approximation?
- I bet, that if one can approximate matrix elements of Wishart distribution with normal distribution, than the validity of the second method depends on validity of this approximation. But please, can someone correct me, if I am wrong?
I need those answers to justify (if it is justifiable at all :-) ) interchangeable use of both estimators in the article about performance of some joint diagonalization of $C$ algorithms.