But I'm still not sure if I understand what $\Sigma$ is. Does this mean to take the covariance of the entire data set with all classes mixed in, to take the covariance matrix for one class and assume it's the same for the others (even if it isn't), or to average the covariance matrices for the different classes?
I'm playing around with the Iris data set and getting a 98% classification rate with QDA and a 84% classification rate with LDA. I determined $\Sigma$ by doing cov(X)
(Matlab) for the entire data set, but now I'm questioning whether I understood correctly.