0

I am checking assumptions(multivariate normality) of multiple linear regression in R. I have 100 independent variables and want to check their normality.

Should i need to check one by one for all 100 variables? Or should we just assume there is a normality and proceed with the regression? Or is there any way from which I can check the normality of all 100 variables in a single piece of code in R?

Also is there any way the linearity of all these 100 IVs can be checked with the DV?

  • Careful, you do not need multivariate normality for multiple regression. I think you just need normality in each independent variable. If this is the case, I would check out Shapiro-Wilks normality test and Box-Cox Transformations to help – TBSRounder Dec 11 '15 at 21:31
  • 1
    The predictors do not need to be normally disrtibuted. The assumption is on the normality of the errors (for when doing testing of coefficients) , which you can estimate by examining the residuals. [`qqnorm(resid(mod)) ; qqline(resid(mod))` , where `mod` holds the results of your `lm` call ] – user20650 Dec 11 '15 at 22:00
  • 3
    @MarkHeiler, you don't need normality of *any* of the independent variables. @user20650 is correct. If one *really* wants to do null-hypothesis testing on the residuals, you can use any of the standard methods (e.g. `?shapiro.test`), although most statisticians don't recommend this. – Ben Bolker Dec 11 '15 at 22:41
  • 1
    @RamprakashV: The reason we are voting to close/migrate to CrossValidated.com is that you have incorrect notions about statistical methods that are unrelated to coding. We do not choose to offer code to create statistical errors. MarkHeiler appears as confused on this matter as you appear. (And it won't matter if you quote some business text that propagates this nonsense. It is not that difficult to find textbooks that are written with this strategy. We still will not believe them.) – DWin Dec 12 '15 at 04:34
  • 1
    Although this has now been migrated to CV, the question about R code is conversely off-topic here and in any case the comments made on SO already helpfully and directly explain that the statistical premiss here is flawed. – Nick Cox Jan 05 '16 at 10:49
  • For a discussion of the assumptions of multiple regression, please see http://stats.stackexchange.com/questions/16381. [Searching our site](http://stats.stackexchange.com/search?tab=votes&q=normal%20independent%20variable%20regression%20assumption%20-logistic) will also turn up many related threads. – whuber Jan 05 '16 at 15:41

1 Answers1

0

Check out the package MVN. The description says:

Performs multivariate normality tests and graphical approaches and implements multivariate outlier detection and univariate normality of marginal distributions through plots and tests.

toneloy
  • 370
  • 1
  • 6
  • 4
    this is well-meant, but not useful, since the original question is based on a statistical misconception ... – Ben Bolker Dec 11 '15 at 22:41