3

I am reading What are common statistical sins?, and especially @jebyrnes answer:

Failing to test the assumption that error is normally distributed and has constant variance between treatments. These assumptions aren't always tested, thus least-squares model fitting is probably often used when it is actually inappropriate.

and @DikranMarsupial's subsequent comment:

If the data are heteroscedastic you can end up with very innacurate out of sample predictions because the regression model will try too hard to minimise the error on samples in areas with high variance and not hard enough on samples from areas of low variance. This means you can end up with a very badly biased model. It also means that the error bars on the predictions will be wrong.

I am worried about my regression: I used "rlm" with weights. My questions are:

  1. Please give me some pointers about how to do goodness-of-fit and residual diagnosis for rlm in R? Any good tutorials/examples, etc?

  2. How to do the residual diagnosis etc. when regression weights are used?

  3. My end goal was to get the predictions $\hat y$... and I am less concerned about the bias or efficiency of the betas...

Do my predictions $\hat y$ get messed up if there are heteroskedasticity, non-normality, etc?

i.e. what's the relationship between efficiency/unbiasedness and good out-sample prediction?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Luna
  • 2,255
  • 5
  • 27
  • 38
  • You mention predictions prominently. Are you developing this model primarily to use in the future to predict unknown response values, or do you want to better understand the relation between your covariates and the response variable? – gung - Reinstate Monica Jul 05 '12 at 17:27
  • Thank you gung for helping me. The answew to your question is "both"... the initial stage is of course about understanding data; however the end-goal is predict onto a wide data set... Thank you! – Luna Jul 05 '12 at 17:29
  • 1
    It always seems reasonable that you would want to know both, however, the optimal way to go about the process differs, so you need to make choices about what you want. The reason I asked is that most people in science are after understanding, but that goal is inconsistent with your statement "I am less concerned about the bias or efficiency of the betas". For info on the distinction b/t them see [these](http://stats.stackexchange.com/questions/18896/) [two](http://stats.stackexchange.com/questions/1194/) previous CV questions. – gung - Reinstate Monica Jul 05 '12 at 17:57
  • Thank you Gung. If I had to only choose one, I would say "prediction"... that's my ultimate goal... Thank you! – Luna Jul 05 '12 at 19:56
  • Have a look at the `fit.models` function in the [fit.models](https://cran.r-project.org/web/packages/fit.models/fit.models.pdf) package. This allows comparison of an `rlm` fit with an `lm` fit. – Tony Ladson Feb 08 '16 at 02:38

0 Answers0