11

If $\beta^*=\mathrm{arg\,min}_{\beta} \|y-X\beta\|^2_2+\lambda\|\beta\|_1$, can $\|\beta^*\|_2$ increase when $\lambda$ increases?

I think this is possible. Although $\|\beta^*\|_1$ does not increase when $\lambda$ increases (my proof), $\|\beta^*\|_2$ can increase. The figure below shows a possibility. When $\lambda$ increases, if $\beta^*$ travels (linearly) from $P$ to $Q$, then $\|\beta^*\|_2$ increases while $\|\beta^*\|_1$ decreases. But I don't know how to construct a concrete example (i.e., to construct $X$ and $y$), so that the profile of $\beta^*$ demonstrate this behavior. Any ideas? Thank you.

enter image description here

ziyuang
  • 1,536
  • 8
  • 32

2 Answers2

10

The answer is yes, and you have a graphical proof in $\ell_2$ right there.

Look up the definition of equivalence of vector norms. You will find that $$ \|x\|_2 \leq \|x\|_1 \leq \sqrt{n}\|x\|_2, $$ where $n$ is the dimension of the vector $x$. Hence, there is some wiggle room for the $\ell_2$ norm, compared to the $\ell_1$ norm.

In fact, the problem you want to solve can be stated as:

Find $d$ such that $$ \|x + d\|_2 > \|x\|_2 $$ while at the same time $$ \|x + d\|_1 < \|x\|_1. $$

Square the first inequality, expand and see that $$ 2\sum_i x_id_i > -\sum_i d_i^2 $$ and that, by assuming that $x_i\geq0$ and $x_i+d_i\geq0$, we obtain from the second inequality that we must have $$ \sum_i d_i < 0. $$ Any $d$ that fulfills these constraints will increase the $\ell_2$ norm while decreasing the $\ell_1$ norm.

In your example, $d\approx[-0.4, 0.3]^T$, $x:=P\approx[0.5, 0.6]^T$, and $$ \sum_i d_i\approx-0.1<0, $$ and $$ 2\sum_i P_id_i\approx-0.04 > -0.25 \approx -\sum_i d_i^2. $$

Tommy L
  • 1,396
  • 9
  • 17
3

Thanks for @TommyL's answer, but his answer is not direct on the construction of $X$ and $y$. I somehow "solve" this myself. First, when $\lambda$ increases, $\|\beta^*\|_2$ will not increase when each $\beta^*_i$ decreases monotonically. This happens when $X$ is orthonormal, in which we have

$$ \beta^*_i=\mathrm{sign}(\beta_i^{\mathrm{LS}})(\beta_i^{\mathrm{LS}}-\lambda)_+ $$

Geometrically, in this situation $\beta^*$ moves perpendicularly to the contour of the $\ell_1$ norm, so $\|\beta^*\|_2$ cannot increase.

Actually, Hastie et al. mentioned in the paper Forward stagewise regression and the monotone lasso, a necessary and sufficient condition of the monotonicity of the profile paths:

enter image description here

In Section 6 of the paper they constructed an artificial data set based on piecewise-linear basis functions which violates the above condition, showing the non-monotonicity. But if we have luck, we can also create a random data set demonstrating the similar behavior but in a simpler way. Here is my R code:

library(glmnet)
set.seed(0)
N <- 10
p <- 15
x1 <- rnorm(N)
X <- mat.or.vec(N, p)
X[, 1] <- x1
for (i in 2:p) {X[, i] <- x1 + rnorm(N, sd=0.2)}
beta <- rnorm(p, sd=10)
y <- X %*% beta + rnorm(N, sd=0.01)
model <- glmnet(X, y, family="gaussian", alpha=1, intercept=FALSE)

I deliberately let $X$'s columns highly correlated (far from the orthonormal case), and the true $\beta$ have both large positive and negative entries. Here is $\beta^*$'s profile (not surprisingly only 5 variables are activated):

enter image description here

and the relation between $\lambda$ and $\|\beta^*\|_2$:

enter image description here

So we can see that for some interval of $\lambda$, $\|\beta^*\|_2$ increases as $\lambda$ increases.

ziyuang
  • 1,536
  • 8
  • 32