Can I use gradient descent for Least Absolute Deviation Regression?

Question

I've read in some posts that the absolute function is not differentiable. However, it is not differentiable only at 0. Can I not check the value at each training point to calculate the derivative. Only if the error happens to be 0 will it create a problem, in which case I could ignore the point. Simplex is the preferred way, probably because it is more efficient, but I just want to understand that is efficiency the only reason why we might not want to use gradient descent or is there some other issue as well.

You didn't ask about that, but you can find the exact solution with linear programming methods. Some code at https://stats.stackexchange.com/questions/260504/how-to-solve-least-absolute-deviation-by-simplex-method/269966#269966 — kjetil b halvorsen, Nov 11 '17 at 21:52

Kyle · Answer 1 · 2017-11-11T22:21:00.080

7

This difficulty can be addressed using subgradients. A subgradient of a convex function $f$ at some point $x$ is any value $v$ that satisfies $f(y)-f(x)\ge v\cdot (y-x)$ for all $y$ in the domain. Note that if $f$ is differentiable at $x$ then the subgradient is just the gradient.

The subgradient method is very similar to gradient descent in concept; update the current solution in the direction of the subgradient. That is, take $x^{\{k+1\}}=x^{\{k\}}-\alpha_k v^{\{k\}}$, where $v^{\{k\}}$ is a subgradient at $x^{\{k\}}$ and $\alpha_k$ is the step size at iteration k.

The subgradient of $f(x)=|x|$ is $v(x)=\left\lbrace \begin{array}{cc} 1, & x>0 \\ -1, & x<0 \\ [-1,1], &x=0 \end{array}\right.$

edited Nov 11 '17 at 22:21

answered Nov 11 '17 at 21:47

Kyle

226
1
7

Thanks! so basically it can be solved, using sub gradient instead of gradient. However, I see you mention sub gradient has 2 values at x = 0 for your example, what value would the algorithm pick in this case? – racerX Nov 14 '17 at 06:47
The subgradient of $|x|$ at zero is not two values, it's the interval from -1 to 1 (infinite values). The algorithm need only pick one (e.g. 0). – Kyle Nov 14 '17 at 16:32

Can I use gradient descent for Least Absolute Deviation Regression?

1 Answers1