1

In the general case of Gaussian Discriminant Analysis, we learn the Gaussian parameters for each class. I understand that if each class distribution has the same covariance matrix, then the learned separation boundary will be linear. But how is this restriction enforced when learning from the data? Each class will have its own data, from which we learn MLEs for the mean and covariance matrices. It seems to me that this will certainly not give us the same covariance matrix for each class, unless the data is perfectly cooperative.

How is this done in practice?

Fequish
  • 697
  • 2
  • 5
  • 17
  • As @curiosity has said in their answer, the weighted average covariance matrix is computed, it is called pooled within-class covariance matrix. It is used then at extraction and classification stages of LDA. At classification stage, when assessing posterior probabilities it is also possible to use separate cov matrices for each class instead of the pooled one. LDA classification is explained in [this](http://stats.stackexchange.com/a/31384/3277) answer. – ttnphns Oct 05 '15 at 07:28

1 Answers1

2

I think that usually, rather than calculate separate covariances for each gaussian, you calculate one covariance that sums over covariances from all classes, namely: $$ \hat{\Sigma} = \frac{1}{N-C}\sum_c \sum_{x_i \in c} (x_i-\hat{\mu})(x_i-\hat{\mu})^T$$ where there are $N$ samples, $C$ classes, and $x_i$ are the data vectors. Of course, this means that you can calculate the covariance matrices for each class and then simply sum them for the same result.

I think this is done mostly because in LDA, since we are already assuming that the covariance matrices are identical, this lets us put all of our data towards estimating that "common" covariance.

curiosity_delivers
  • 173
  • 1
  • 1
  • 8