1

I'm doing a regression analysis which involves 4 independent variables (IV). I performed a Shapiro-Wilk test to test the normality of of each of the IVs and it turned out that the the test showed a value less than 0.05 (which means the data is not normally distributed).

So the question is how do i make my data to follow a normal distribution. Because all parametric tests are based on the assumption that the data is normally distributed.

My data set consist of 26 records (n=26). The dependent variable is House Price Index. While the IV is Gross Domestic Product, Population, Lending Rates and Gross National Income.

Therefore I'm trying to come up with a regression model with these variables. But im facing the normality problem in the IV and DV data distributions.

Thank you.

Raam
  • 11
  • 2
  • 2
    It is *not* true that your data has to be always and exactly normally distributed for parametric tests, see: http://stats.stackexchange.com/questions/2492/is-normality-testing-essentially-useless and if you really need to transform you data, Box-Cox is the simple and commonly used approach http://en.wikipedia.org/wiki/Power_transform#Box.E2.80.93Cox_transformation But to provide the precise answer you have to tell what is your data and what do you want to do with it? – Tim Apr 28 '15 at 08:06
  • @Tim Thank you Tim. I have added more details to my question. – Raam Apr 28 '15 at 10:06
  • With such a small sample the *sample size* should be a greater concern to you than normality. – Tim Apr 28 '15 at 10:12
  • @Tim Agreed Tim. I would want to increase the number of records but the available (published) data is only that much. – Raam Apr 28 '15 at 10:18
  • Are these cross-sectional data for several countries or time series for one country (or ...). If the first, then it is likely, at a minimum, that you need to transform most predictors, but not because they are non-normal, rather because you would be unlikely to get anything but absurd fits otherwise. The choice of predictors seems fairly strange too, but that is a different problem. – Nick Cox Apr 28 '15 at 11:23
  • @NickCox This is for one country...its to study the relation between macroeconomic indicators on housing prices.. – Raam Apr 28 '15 at 12:09
  • Not my field, but it sounds more like a time series problem then. – Nick Cox Apr 28 '15 at 12:31
  • 1
    Please explain what *form* of "regression analysis" you are using that requires the IVs to have (approximately) normal distributions. – whuber Apr 28 '15 at 15:27
  • @whuber I'm using a linear regression model (at the moment)...I may try out non-linear ones as well. – Raam Apr 29 '15 at 01:51
  • That makes no suppositions whatsoever about the distributions of the IVs. That makes your problem disappear. For more about this issue, please see the results of a [site search](http://stats.stackexchange.com/search?tab=votes&q=regression%20transformation%20distribution%20independent). – whuber Apr 29 '15 at 13:41

0 Answers0