Comparing the robustness least absolute deviation with OLS

Question

My professor told us that the OLS estimator can be influenced by outliers because

$$\hat{\beta}_{OLS}=\text{argmin}\left\lVert y - X^T \beta\right\rVert_2^2 $$ implies that the first order condition $$2X^T(y-X\beta)=\sum_{i=1}^{n}2x_i (y_i -x_i^T\beta)$$ so the $y_i$ will have an arbitrary impact, which makes this potentially unbounded.

She said the least absolute deviation is more robust because, for this optimization problem,

$$\hat{\beta}_{LAD}=\text{argmin}_{\beta}\bigg\lbrace\sum_{i=1}^{n}\bigg|y_i - x_i^T\beta \bigg|\bigg\rbrace$$

the "derivative" (even though she acknowledged it is nondifferentiable) would be

$$\sum_{i=1}^{n}x\cdot \text{sign}(y_i -x_i^T\beta)$$

How did she obtain this formula given that the absolute value is non-differentiable? What is the justification for this?

Why is this impact more robust?

Note that $|x-k|$ is only nondifferentiable at one point; to the left of $x=k$ its derivative (with respect to $x$) is $-1$ and to the right of it its derivative is $+1$ (the derivative with respect to $k$ would be the negative of those). — Glen_b, Nov 05 '19 at 01:21
Try putting $y = 1$ and $y = 1000000$ into the two derivatives for, let us say, $x=1$ and $\beta=0$. Which expression changes the most with the change in $y$? What does this imply about outliers and their influence on the derivative, and thereby on the estimate of $\beta$? — jbowman, Nov 05 '19 at 01:23
I think between your two comments, I understand everything now. I was confused why $\text{sign}$ appears but now I understand if I consider the derivatives on either side and I see why the OLS would grow much faster. Thank you. — Stan Shunpike, Nov 05 '19 at 01:34

Comparing the robustness least absolute deviation with OLS

0 Answers0