In Kahneman and Deaton (2010)$^\dagger$, the authors write the following:
This regression explains 37% of the variance, with a root mean square error (RMSE) of 0.67852. To eliminate outliers and implausible income reports, we dropped observations in which the absolute value of the difference between log income and its prediction exceeded 2.5 times the RMSE.
Is this common practice? What is the intuition behind doing so? It seems somewhat strange to define an outlier based upon a model which may not be well-specified in the first place. Shouldn't the determination of outliers be based on some theoretical grounds for what constitutes a plausible value, rather than how well your model predicts the real values?
$\dagger$: Daniel Kahneman, Angus Deaton (2010): High income improves evaluation of life but not emotional well-being. Proceedings of the National Academy of Sciences Sep 2010, 107 (38) 16489-16493; DOI: 10.1073/pnas.1011492107