Lasso and its dual: rates of regularisations

Question

Let us consider the following lasso estimator: $$ \hat{\beta}_{L} = \arg\min \, \frac{1}{n}\sum_{i}^{n}||y_{i} - \textbf{x}_{i}\beta||_{2}^{2} + \frac{\lambda_{n}}{n}\sum_{j=1}^{p}|\beta_{j}| $$ For any $\lambda_{n}$ the problem above is equivalent to $$ \hat{\beta}_{L} = \arg\min \, \frac{1}{n}\sum_{i}^{n}||y_{i} - \textbf{x}_{i}\beta||_{2}^{2}, $$ subject to $$ \sum_{j=1}^{p}|\beta_{j}| \leq t_{n} $$ for some $t_{n}$.

Next, assume that the sequence $\{\lambda_{n}\}$ is $o(n)$. What is about $\{t_{n}\}$?

Edit: For the start, we can assume for simplicity, that $\textbf{X} = \textbf{I}$.

I mean what would be the sequence $\{t_{n}\}$ (the rate of it) corresponding to $\{\lambda_{n}\}$ — ABK, Mar 02 '20 at 15:04
IIRC, the equivalence depends on the data, so you may need some assumptions about $\mathbf{x}, y$? — Ben Reiniger, Jan 07 '21 at 17:06

Sextus Empiricus · Accepted Answer · 2021-01-12T14:33:45.430

The SSE of the LASSO solution $\tilde\beta$ in comparison to the SSE of the OLS solution $\hat{\beta}$ can be expressed as

$$(X (\tilde\beta-\hat{\beta})) \cdot (X (\tilde\beta-\hat{\beta})) = (\tilde\beta-\hat{\beta}) X^TX (\tilde\beta-\hat{\beta}) $$

You can see this graphically as some ellipsoid surface (as the image below which I copied from this question).

Limit behaviour of $ \frac{1}{n}\sum_{i}^{n}||y_{i} - \textbf{x}_{i}\tilde\beta||_{2}^{2} $

This ellipsoid will depend on the particular sample (based on which $\hat\beta$ and $X^TX$ will vary) but for $n \to \infty$ you will get that the variation in this surface becomes smaller.
Limit behaviour of $ \frac{\lambda_{n}}{n}\sum_{j=1}^{p}|\tilde\beta_{j}|$

If $\{\lambda_{n}\}$ is $o(n)$ then $\lbrace\frac{\lambda_{n}}{n}\rbrace$ is $o(1)$ and approaches zero.

So the first term in the cost function will approach some quadratic function of $\tilde\beta$ and the second term will approach zero. The lasso solution that minimises the sum of these terms will approach the true $\beta$ (The reasoning here is very intuitive but I am sure there is some reference for that).

The consequence is that $\{t_n\} - \sum_{j=1}^{p}|\beta_{j}| $ will approach zero (where $\beta_j$ refers to the true coefficients).

Dear @Sextus Empiricus, I didn't understand the conclusion about $t_{n}$. Could you, please, clarify it? — ABK, Jan 12 '21 at 13:31
@ABK I had a misunderstanding about your meaning of $\{\lambda_n\} \in o(n)$ and thought of $\lambda_n/n$ approaching a constant instead of zero. I have edited the answer now. — Sextus Empiricus, Jan 12 '21 at 14:27
ok, again it is a bit confusing. If I understood correctly, the conclusion is that both $\{ \frac{\lambda_{n}}{n}\}$ and $\{t_{n} \}$ have the same rate of convergence — ABK, Jan 12 '21 at 15:49
@ABK I am not sure about the *rate* of convergence of $t_n$, but if $n \to \infty$ and $\frac{\lambda_n}{n} \to 0$ then $t_n \to \sum_{j=1}^{p}|\beta_{j}|$. — Sextus Empiricus, Jan 12 '21 at 16:21

Lasso and its dual: rates of regularisations

1 Answers1