16

What is the asymptotic time complexity of Lasso regression as either the number of rows or columns grows?

amoeba
  • 93,463
  • 28
  • 275
  • 317
user2763361
  • 671
  • 6
  • 20

2 Answers2

16

While @JacobMick provides a broader overview and a link to a review paper, let me give a "shortcut answer" (which may be considered a special case of his answer).

Let the number of candidate variables (features, columns) be $K$ and the sample size (number of observations, rows) be $n$. Consider LASSO implemented using LARS algorithm (Efron et al., 2004). The computational complexity of LASSO is $\mathcal{O}(K^3 + K^2 n)$ (ibid.)

  • For $K<n$, $K^3 < K^2 n$ and the computational complexity of LASSO is $\mathcal{O}(K^2 n)$, which is the same as that of a regression with $K$ variables (Efron et al., 2004, p. 443-444; also cited in Schmidt, 2005, section 2.4; for computational complexity of a regression, see this post).
  • For $K \geqslant n$, $K^3 \geqslant K^2 n$ and the computational complexity of LASSO is $\mathcal{O}(K^3)$ (Efron et al., 2004).

References:

Richard Hardy
  • 54,375
  • 10
  • 95
  • 219
  • Richard, can you comment on iteration complexity for the GLM approach here https://stats.stackexchange.com/questions/280304/how-does-lasso-scale-with-the-design-matrix-size ? – rnoodle Aug 09 '17 at 10:57
  • @moodle, I cannot without digging deep into that (which I do not have time for at the moment), but +1 for your question. – Richard Hardy Aug 09 '17 at 10:59
  • I had a look, but it is not clear - would be good to get a second pair of eyes on it. So there is iteration complexity and full convergence complexity, and I think the literature somewhat is vague sometimes over the definitions. Basically I have an algorithm that use a lasso solver in a very critical position such that my algorithm's complexity is strongly dependent on the solver. Would be good to nail this. Cheers! I'll put a bounty on it for your input – rnoodle Aug 09 '17 at 11:03
  • @rnoodle, I strongly doubt I will be able to help you there any time soon, but a bounty may certainly attract other people who know better. Good luck! – Richard Hardy Aug 09 '17 at 11:58
5

Recall that lasso is a linear model with a $l_1$ regularization.

Finding the parameters can be formulated as a unconstrained optimization problem, where the parameters are given by

$\arg \min_\beta ||y - X\beta||^2 + \alpha||\beta||_1$.

In the constrained formuation the parameters are given by

$\arg \min_\beta ||y - X\beta||^2 s.t.||\beta||_1 < \alpha$

Which is a quadratic programming problem and thus polynomial.

Almost all convex optimization routines, even for flexible nonlinear things like neural networks, rely on computing the derivative of your target w.r.t. parameters. You cannot take the derivative of $\alpha||w||_1$ though. As such you rely on different techniques. There are many methods for finding the parameters. Here's a review paper on the subject, Least Squares Optimization with L1-Norm Regularization. Time-complexity of iterative convex optimization is kind of tricky to analyze, as it depends on a convergence criterion. Generally, iterative problems converge in fewer epochs as the observations increase.

Cliff AB
  • 17,741
  • 1
  • 39
  • 84
Jessica Collins
  • 3,861
  • 17
  • 20
  • 5
    Several things: saying that a problem is "polynomial" is not particularly helpful, unless perhaps you are looking at some sort of combinatorics problem (which is usually exponential). Second, calculating the derivatives is pretty much always **not** the limiting step. Third, usually when discussing time complexity of an iterative algorithm, one typically looks at the cost *per step*, and thus does **not** depend on convergence criteria. Finally, it's not typically the case that more observations = less iterations. – Cliff AB Jan 14 '16 at 18:16