18

The question What to conclude from this lasso plot (glmnet) demonstrates solution paths for the lasso estimator that are not monotonic. That is, some of the cofficients grow in absolute value before they shrink.

I've applied these models to several different kinds of data sets and never seen this behavior "in the wild," and until today had assumed that they were always monotonic.

Is there a clear set of conditions under which the solution paths are guaranteed to be monotone? Does it affect the interpretation of the results if the paths change direction?

shadowtalker
  • 11,395
  • 3
  • 49
  • 109
  • Monotone in what sense? It seems not very meaningful to me if you want to treat it as a graph of some function. – Henry.L Jun 03 '17 at 02:34
  • 4
    @Henry.L The question can be rephrased as: when is the following true: for $\lambda_1 \ge \lambda_2$, we have that $(\hat\beta_{\lambda_2})_j \ge (\hat\beta_{\lambda_1})_j$ for all $j$, where $\hat\beta_\lambda = \arg\min_\beta \frac{1}{2n}\|y-X\beta\|_2^2 + \lambda \|\beta\|_1$. That is, the lasso uniformly shrinks componentwise. Could you please clarify what you doubt is meaningful? – user795305 Jun 04 '17 at 04:49
  • 2
    note: understanding the way in which lasso shrinks coefficients is the topic of both this question and https://stats.stackexchange.com/questions/145299/can-beta-2-increase-when-lambda-increases-in-lasso – user795305 Jun 09 '17 at 01:59
  • 1
    I don't know how I missed this before, the question is answered for lasso on the OP's response to his own question in the question above. – user795305 Aug 09 '17 at 19:06

1 Answers1

2

I can give you a sufficient condition for the path to be monotonic: an orthonormal design of $X$.

Suppose an orthonormal design matrix, that is, with $p$ variables in $X$, we have that $\frac{X'X}{n} = I_p$. With an orthonormal design the OLS regression coefficients are simply $\hat{\beta}^{ols} = \frac{X'y}{n}$.

The Karush-Khun-Tucker conditions for the LASSO thus simplify to:

$$ \frac{X'y}{n} = \hat{\beta}^{lasso} + \lambda s \implies \hat{\beta}^{ols} = \hat{\beta}^{lasso} + \lambda s $$

Where $s$ is the sub gradient. Hence, for each $j\in \{1, \dots, p\}$ we have that $\hat{\beta}_j^{ols} = \hat{\beta}_j^{lasso} + \lambda s_j$, and we have a closed form solution to the lasso estimates:

$$ \hat{\beta}_j^{lasso} = sign\left(\hat{\beta}_j^{ols}\right)\left(|\hat{\beta}_j^{ols}| - \lambda \right)_{+} $$

Which is monotonic in $\lambda$. While this is not a necessary condition, we see that the non-monotonicity must come from the correlation of the covariates in $X$.

Carlos Cinelli
  • 10,500
  • 5
  • 42
  • 77