When is a good time for checking the outliers? Do I need to check all variables separately before running the regression or I can bring all variables into the model first and then try to find the outliers by residual plot?
Asked
Active
Viewed 117 times
2
-
Neither;) See [here](https://stats.stackexchange.com/questions/46229/fast-linear-regression-robust-to-outliers/46234#46234). The first sentence in the answer there pertains also to the approach you propose ('bring all variables into the model first and then try to find the outliers by residual plot') – user603 Jun 13 '17 at 12:59
-
When you have found them what do you intend to do? – mdewey Jun 13 '17 at 13:42
-
1@user603 actually I am interested in the OP's purpose not yours. – mdewey Jun 14 '17 at 15:03
1 Answers
1
I'd be interested to see what others in the community say, but I would recommend doing both. Both can bring up points of interest that may be worth further investigation by you or your team. Outliers in the model indicate poor fit for that particular case (for whatever reason), and should be investigated. Outliers in any given variable may indicate a mistype (etc), and also should be investigated. Though these are often related (e.g., forgot to code -9 as missing: gives outliers in the model and in the data), they don't have to be (so again, I'd check both).

mflo-ByeSE
- 310
- 2
- 11