When am I allowed to make linear regression on a small sample?

Question

I'm trying to evaluate the linear correlation between to continuous variables (% value calculated from EEG datas and the area of an anatomical region of the brain). I have a sample of 18 right now. From what I understand, I need to make sure that the x and y variables vary together in a joint distribution that is normally distributed. Or at least that the distribution of each of these variables is normal.

My question is what do I have to do to make sure that the assumptions needed to draw valid conclusions about the correlation coefficient are met?

I though about evaluating the normality of the distribution with the Shapiro–Wilk test for each variable and then simply calculate the Pearson R. Would that be valid? Would it be better to draw a normal probability plot?

Thanks!

Have you tried simply checking the correlation between the variables? Not meeting assumptions usually reduces power, but if you have statistically significant results, that might be a good start. — Behacad, Sep 13 '13 at 17:48
You may want to read these threads: [pearsons-or-spearmans-correlation-with-non-normal-data](http://stats.stackexchange.com/questions/3730/), & [is-normality-testing-essentially-useless](http://stats.stackexchange.com/questions/2492/). — gung - Reinstate Monica, Sep 13 '13 at 18:15
Could you show us your data? With such few points it very hard to test for normality (to illustrate this generate 18 numbers from normal distribution, does the data look normally distributed? probably not). — pontikos, Sep 13 '13 at 20:12
Welcome to the site, @pontikos. This is not an answer to the OP's question, it is a comment. Please only use the "Your Answer" field to provide answers. I recognize it's frustrating, but you will be able to comment anywhere when your reputation >50. Since you are new here, you may want to read our [about page](http://stats.stackexchange.com/about), which contains information for new users. — gung - Reinstate Monica, Sep 13 '13 at 20:14
Boris, it is not necessary for *either* variable to have a normal distribution and therefore a test of normality would be of little use. What matters most is that on a scatterplot the data appear to trend along a line and there are no extraordinary ("outlying") excursions from that line. A secondary issue is that the scatter of the data around the line should be approximately cigar-shaped or (American) football-shaped. All these can be checked simply by looking at the scatterplot. — whuber, Sep 13 '13 at 20:20
Thanks for the answers! I will take a look at the Doornik-Hansen test, the scatterplot and the normality plot. But will mostly consider the scatterplot. @pontikos I will come back with the data as soon as I have them! — Boris, Sep 13 '13 at 21:46

January · Accepted Answer · 2013-09-14T09:43:01.523

I think the common practice in such a case is visually inspecting the data. Check whether you have outrageous outliers and make a qq-plot (normal probability plot). Here is an example from Wikipedia:

enter image description here

If your variable is normal, all sample will fall almost on a straight line. In any case, with 18 samples, unless your variables are really skewed or log-normal you will most likely not be able to see a clear picture.

Some people discourage from using tests of normality. You see, with enough samples, you will almost always be able to reject the null hypothesis -- even tiny deviations will show. With just a few samples, even huge deviations will not be statistically significant.

As for using correlation instead of regression: I think this is the case where dependent / independent variables are very clearly defined, so regression is the correct solution.

When am I allowed to make linear regression on a small sample?

1 Answers1