My professor told us that the OLS estimator can be influenced by outliers because
$$\hat{\beta}_{OLS}=\text{argmin}\left\lVert y - X^T \beta\right\rVert_2^2 $$ implies that the first order condition $$2X^T(y-X\beta)=\sum_{i=1}^{n}2x_i (y_i -x_i^T\beta)$$ so the $y_i$ will have an arbitrary impact, which makes this potentially unbounded.
She said the least absolute deviation is more robust because, for this optimization problem,
$$\hat{\beta}_{LAD}=\text{argmin}_{\beta}\bigg\lbrace\sum_{i=1}^{n}\bigg|y_i - x_i^T\beta \bigg|\bigg\rbrace$$
the "derivative" (even though she acknowledged it is nondifferentiable) would be
$$\sum_{i=1}^{n}x\cdot \text{sign}(y_i -x_i^T\beta)$$
How did she obtain this formula given that the absolute value is non-differentiable? What is the justification for this?
Why is this impact more robust?