2

I'm studying neuroscience and have a question about presenting some data. I would like to know if tertiary splits are acceptable when the variable in question is normally or not normally distributed. Does this matter?

I'm using SPSS to run 2x2 ANOVA, with group as one factor, but want to know whether I can use "high" and "low" tertiary splits as the levels of the factor if that variable is not normally distributed. I did this and the results are as hypothesized, but I don't know whether I am justified (statistically/mathematically speaking) to use the splits like that.

I am not a statistician, so simple explanations are preferred.

jonsca
  • 1,790
  • 3
  • 20
  • 30
jpf66
  • 165
  • 3
  • 11
  • In regression / ANOVA what matters is *only* whether your residuals are normally distributed, not your IVs. The distribution of your covariates is irrelevant. Note that dichotomizing your variables does not make them more normal. You may find this thread helpful: [what-if-residuals-are-normally-distributed-but-y-is-not](http://stats.stackexchange.com/questions/12262/). – gung - Reinstate Monica Oct 24 '12 at 02:09
  • This is very helpful. So all that matters is whether my DV in the ANOVA or regression passes Levene's test for equality of variances? – jpf66 Oct 24 '12 at 02:14
  • @user16204 Equality of variance is not the only assumption of relevance when doing regression, and explicit hypothesis tests of regression assumptions are not necessarily the best way to see whether the assumptions are reasonable. – Glen_b Oct 24 '12 at 09:28

1 Answers1

2

(Perhaps I'll convert these to an official answer, if this is all you need.)

In regression / ANOVA what matters is only whether your residuals are normally distributed, not your IVs. The distribution of your covariates is irrelevant. Note that dichotomizing your variables does not make them more normal. You may find this thread helpful: what-if-residuals-are-normally-distributed-but-y-is-not.

The main assumptions you need to check are these: You want your variances to be at least roughly equal. With enough data you are almost certain to find that the group variances are not identical, but it doesn't much matter until the ratio of the highest variance to lowest variance is about four. You also want your residuals to be approximately normal, but if you have enough data, they can diverge a good bit & the central limit theorem will still cover you. The most important assumption is that your data are independent. There is a great answer covering regression / ANOVA assumptions here: what-is-a-complete-list-of-the-usual-assumptions-for-linear-regression.

On a slightly different note, I'm against dichotomizing variables in general for several reasons. I discuss these here: how-to-choose-between-anova-and-ancova-in-a-designed-experiment.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650