In the general case of Gaussian Discriminant Analysis, we learn the Gaussian parameters for each class. I understand that if each class distribution has the same covariance matrix, then the learned separation boundary will be linear. But how is this restriction enforced when learning from the data? Each class will have its own data, from which we learn MLEs for the mean and covariance matrices. It seems to me that this will certainly not give us the same covariance matrix for each class, unless the data is perfectly cooperative.
How is this done in practice?