Multivariate normality in Discriminant Analysis when using dummy variables

Question

I've studied statistics now for almost two years and I'm starting to believe I have missed something very fundamental.

I'm doing discriminant analysis where, as I understand it, I can use dummy variables as independent variables. However, one assumption seem to be that the data is drawn from a multivariate normal distribution. Further, the marginal distributions of a multivariate normal distribution are normal (but not necessarily the other way around).

Now, if I use a dichotomous variable (yes/no, etc) as a independent variable, how can the multivariate distribution be normal since a dichotomous variable certainly is not?

What do you mean by using a dummy variable "as discriminant"? — amoeba, Feb 18 '15 at 21:47
The dichotomous variable in your example is not assumed normal. The variables used to estimate the discriminate vector ( a set of independent covariates) are assumed to be multivariate normal. If you are using a dummy to calculate the discriminant vector itself, then multivariate normality is violated and you can use another model (such as a multinominal logit) instead. — Zachary Blumenfeld, Feb 18 '15 at 22:25
zach: I'm starting to think I have bigger problems than I initially though... so if I have a function F(x,y)=w1*x+w2*y where x is normally distributed and y dummy variable. F(x,y) can be multivariate normal even though y is not? — Skrilovach, Feb 18 '15 at 23:25
You can still use LDA if the data are not normal, but it is not guaranteed to be optimal. See here: [Linear Discriminant Analysis and non-normal distributed data](http://stats.stackexchange.com/questions/110908). — amoeba, Feb 18 '15 at 23:39
F(x,y) would be a normal mixture, not normal, but it may be good enough. Personally I prefer multinominal logit/probit models which accomplish the same thing as LDA but don't assume multivariate normal covariates.(Though they come with there own assumptions). — Zachary Blumenfeld, Feb 19 '15 at 03:37
@amoeba is correct in saying LDA will not be optimal, but it could also be really close and more efficient than probit/logit depending on your data. You should use your best judgement. Additional resources: http://mrvar.fdv.uni-lj.si/pub/mz/mz1.1/pohar.pdf http://stats.stackexchange.com/questions/95247/logistic-regression-vs-lda-as-two-class-classifiers — Zachary Blumenfeld, Feb 19 '15 at 03:37

Multivariate normality in Discriminant Analysis when using dummy variables

0 Answers0