3

In my OLS regression not all assumptions are perfectly met, but I read that due to a large sample size there is a certain robustness to assumptions (my sample is 2500 people).

E.g. the DV isn't perfectly normal distributed, with significant, but small skewness and kurtosis (if you test for it, e.g. using the 'gvlma' package in R).

I am now looking for literature to cite when I talk about my regression, e.g. "Inspection of fitted values vs mean plots revealed minimal deviation from homoscedasticity, but within range of robustness of regression (xxx, 1990)."

Peter Flom
  • 94,055
  • 35
  • 143
  • 276
Torvon
  • 823
  • 4
  • 10
  • 21
  • 5
    One thing to note here, is that it doesn't matter if the DV is normally distributed, only if the residuals are (see here: [what-if-residuals-are-normally-distributed-but-y-is-not](http://stats.stackexchange.com/questions/12262/)), & even then, with a large sample size the central limit theorem will cover you. You are likely to be fine. – gung - Reinstate Monica Oct 19 '12 at 21:28
  • 1
    Go to Google Scholar and search with the terms: robustness of regression assumptions. Many of the references are in peer reviewed journals with full text available in JSTOR. – R. Schumacher Oct 20 '12 at 00:13
  • Gung, I absolutely believe you, but there are a lot of websites out there listing normal distribution of Y as assumption in linear regression. I hope the reviewers will know that this is not the case. Do you have a citation that I can use in combination with the "central limit theorem"? – Torvon Oct 20 '12 at 01:23
  • 2
    Torvon, any competent textbook will be clear about this. For example, Draper & Smith (*Applied Regression Analysis*, 2nd Ed.) develop the regression equations at the beginning of section 2.6, then discuss what can be done in a subsection "Without Distributional Assumptions," and only then discuss what can further be done (mainly with the F tests) in a subsection "With Distributional Assumptions." Ultimately, "robustness" is going to be relative to the conclusions you are trying to draw: some of them will be largely insensitive to homoscedasticity but others might be more sensitive. – whuber Oct 20 '12 at 18:15
  • Maybe check Coombs, William T., James Algina, and Debra Olson Oltman. "Univariate and multivariate omnibus hypothesis tests selected to control type I error rates when population variances are not necessarily equal." Review of Educational Research 66.2 (1996): 137-179 ; Tomarken, Andrew J., and Ronald C. Serlin. "Comparison of ANOVA alternatives under variance heterogeneity and specific noncentrality structures." Psychological Bulletin 99.1 (1986): 90. – Ben Bolker Jul 26 '18 at 23:17

1 Answers1

3

Answered in comments: One thing to note here, is that it doesn't matter if the DV is normally distributed, only if the residuals are (see here: What if residuals are normally distributed, but y is not?), & even then, with a large sample size the central limit theorem will cover you. You are likely to be fine. – gung Go to Google Scholar and search with the terms: robustness of regression assumptions. Many of the references are in peer reviewed journals with full text available in JSTOR. – R. Schumacher

Gung, I absolutely believe you, but there are a lot of websites out there listing normal distribution of Y as assumption in linear regression. I hope the reviewers will know that this is not the case. Do you have a citation that I can use in combination with the "central limit theorem"? – Torvon

Torvon, any competent textbook will be clear about this. For example, Draper & Smith (Applied Regression Analysis, 2nd Ed.) develop the regression equations at the beginning of section 2.6, then discuss what can be done in a subsection "Without Distributional Assumptions," and only then discuss what can further be done (mainly with the F tests) in a subsection "With Distributional Assumptions." Ultimately, "robustness" is going to be relative to the conclusions you are trying to draw: some of them will be largely insensitive to homoscedasticity but others might be more sensitive. – whuber

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467