Can a quadratic discriminant function gives lower ECM than using a linear discriminant function even when covariance matrices are not equal?

Question

We were trying to classify the counties in California under Trump voted and Clinton voted based on characteristics of those counties and tried to match with the results obtained in the recent elections. We used misclassification approach for this discriminant analysis. We clearly see that the within class correlation coefficients of the 2 classes (Trump and Clinton) are different (see Fig 11 in 1st screenshot attached). Barlett's test for homogenity suggests the same (Fig 12 in 1st screenshot attached). Hence, we expect that quadratic discriminant function would have better ECM than linear discriminant function.

However surprisingly, we ended up getting a higher expected cost of misclassification (ECM) value and even higher no of misclassifications for quadratic discriminant function than for using linear discriminant function (see Table 1 in screenshot2 attached ). How could one possibly explain this? I mean using a quadratic discriminant function should account for the given variables better than linear discriminant function when the classes vary significantly right? Or is it that the results are too exceptional to predict by discriminant analysis? How can I interpret this?

You should spell out ECM= expected cost of misclassification??? As an edit to the original post! — kjetil b halvorsen, Dec 02 '16 at 08:51
Number of misclassifications is NOT a proper scoring rule! Search this site for proper scoring rule. And, you should probably not be using discriminant analysis here. It would be better to try to predict the vote proportions directly, maybe with beta regression, or some other regression method adapted for continuous responses in $(0,1)$. — kjetil b halvorsen, Dec 02 '16 at 09:02
http://stats.stackexchange.com/questions/95247/logistic-regression-vs-lda-as-two-class-classifiers/95274#95274 — kjetil b halvorsen, Dec 02 '16 at 11:39

kjetil b halvorsen · Answer 1 · 2017-02-19T22:33:05.097

Well, as you have seen, it is perfectly possible! Count of misclassifications is not a proper score function ... search this site for "proper score function".

You say " We clearly see that the within class correlation coefficients of the 2 classes (Trump and Clinton) are not different (see Fig 11 in 1st screenshot attached). Barlett's test for homogenity suggests the same ... " but your output doesn't agree, the p-value is given as <.0001! So you should go back to read the output better, and maybe review the theory.

But as for your data analysis question, given as "we were trying to classify the counties in California under Trump voted and Clinton voted based on characteristics of those counties and tried to match with the results obtained in the recent elections. " I doubt that discriminant analysis is a good answer to that question. You would be better off trying to build a predictive model for the vote shares, maybe with a logistic regression (using some overdispersion corrections, as there certainly will be overdispersion). If you post a link to the data, maybe we could have a look ... In fact, in most cases where some sort of discriminant analysis is used, some other approach would be better. One reason (in this case) is that with discriminant analysis you are only focusing on of the vote share is above/below 50%, trying to predict the vote share will give more information, and use the information in the data better. This is because when only looking at above/below 50% (discriminant analysis), there is no difference between 51% and 60%. But making an error when truth is 51% should be less bad than making an error when truth is 60%, when truth is 51% the result is basically a toss-up. By trying to predict the vote share (logistic regression) we focus on the prediction error ($p-\hat{p}$) and not only on $p><0.5$. That uses the data more efficiently.

In the meantime you could search this site for post about logistic regression, there are many good posts ...

We see that the within class correlation coefficients of the 2 classes (Trump and Clinton) are 'different', my bad. — Bharat Ram Ammu, Dec 09 '16 at 15:03
Thanks about the suggestion, but can you please re explain one reason why when some sort of discriminant analysis is used, some other approach would be better? Sorry I couldn't get why 'trying to predict the vote share will give more information, and use the information in the data better'? — Bharat Ram Ammu, Dec 09 '16 at 15:06
I now understand why logistic regression (which was not in our course of Multivariate Statistics) can serve better here than discriminant analysis. Thank you for your input. — Bharat Ram Ammu, Feb 19 '17 at 22:28

Can a quadratic discriminant function gives lower ECM than using a linear discriminant function even when covariance matrices are not equal?

1 Answers1