3

As far as I know, there are robust regression methods for outliers in response $Y$ and heavy-tailed error $\epsilon$. The settings for the design matrix (predictor) $X$ is either fixed design or sub-exponential like Normal distribution. What if the design matrix $X$ is random design and the distribution is heavy-tailed? Is there any advanced methods apart from truncation on $X$ or $\log X$?

Hepdrey
  • 33
  • 5

3 Answers3

2

One simple method would be to use the quantiles of $X$ rather than $X$ per se as predictor, e.g. adding a column which equals $1$ if $x_k$ is larger than the 90 quantile of $X$. This avoids truncation or arbitrarily changing the distribution of $X$.

Arne Jonas Warnke
  • 3,085
  • 1
  • 22
  • 40
1

One method/idea is resistant regression. The idea is to order the squared residuals, rejecting some percent of the largest ones, and then minimize the sum of squared kept residuals. That is highly nonlinear and nonconvex, and one idea for the optimization is genetic algorithms.

One implementation is lqs in R package MASS, see the companion book by Venables & Ripley. For code example see How to get summary statistics from "resistant regression" - lqs - in R?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
1

In Python, in scikit-learn there is also RANSACRegessor, and TheilSenRegressor. Both of them are robust to outliers in the design.

TMat
  • 716
  • 1
  • 10