0

I've studied statistics now for almost two years and I'm starting to believe I have missed something very fundamental.

I'm doing discriminant analysis where, as I understand it, I can use dummy variables as independent variables. However, one assumption seem to be that the data is drawn from a multivariate normal distribution. Further, the marginal distributions of a multivariate normal distribution are normal (but not necessarily the other way around).

Now, if I use a dichotomous variable (yes/no, etc) as a independent variable, how can the multivariate distribution be normal since a dichotomous variable certainly is not?

  • What do you mean by using a dummy variable "as discriminant"? – amoeba Feb 18 '15 at 21:47
  • The dichotomous variable in your example is not assumed normal. The variables used to estimate the discriminate vector ( a set of independent covariates) are assumed to be multivariate normal. If you are using a dummy to calculate the discriminant vector itself, then multivariate normality is violated and you can use another model (such as a multinominal logit) instead. – Zachary Blumenfeld Feb 18 '15 at 22:25
  • amoeba: I meant independent variables. Fixed my question. – Skrilovach Feb 18 '15 at 23:20
  • zach: I'm starting to think I have bigger problems than I initially though... so if I have a function F(x,y)=w1*x+w2*y where x is normally distributed and y dummy variable. F(x,y) can be multivariate normal even though y is not? – Skrilovach Feb 18 '15 at 23:25
  • You can still use LDA if the data are not normal, but it is not guaranteed to be optimal. See here: [Linear Discriminant Analysis and non-normal distributed data](http://stats.stackexchange.com/questions/110908). – amoeba Feb 18 '15 at 23:39
  • F(x,y) would be a normal mixture, not normal, but it may be good enough. Personally I prefer multinominal logit/probit models which accomplish the same thing as LDA but don't assume multivariate normal covariates.(Though they come with there own assumptions). – Zachary Blumenfeld Feb 19 '15 at 03:37
  • @amoeba is correct in saying LDA will not be optimal, but it could also be really close and more efficient than probit/logit depending on your data. You should use your best judgement. Additional resources: http://mrvar.fdv.uni-lj.si/pub/mz/mz1.1/pohar.pdf http://stats.stackexchange.com/questions/95247/logistic-regression-vs-lda-as-two-class-classifiers – Zachary Blumenfeld Feb 19 '15 at 03:37

0 Answers0