0

Let us consider the following lasso estimator: $$ \hat{\beta}_{L} = \arg\min \, \frac{1}{n}\sum_{i}^{n}||y_{i} - \textbf{x}_{i}\beta||_{2}^{2} + \frac{\lambda_{n}}{n}\sum_{j=1}^{p}|\beta_{j}| $$ For any $\lambda_{n}$ the problem above is equivalent to $$ \hat{\beta}_{L} = \arg\min \, \frac{1}{n}\sum_{i}^{n}||y_{i} - \textbf{x}_{i}\beta||_{2}^{2}, $$ subject to $$ \sum_{j=1}^{p}|\beta_{j}| \leq t_{n} $$ for some $t_{n}$.

Next, assume that the sequence $\{\lambda_{n}\}$ is $o(n)$. What is about $\{t_{n}\}$?

Edit: For the start, we can assume for simplicity, that $\textbf{X} = \textbf{I}$.

ABK
  • 396
  • 2
  • 17

1 Answers1

1

The SSE of the LASSO solution $\tilde\beta$ in comparison to the SSE of the OLS solution $\hat{\beta}$ can be expressed as

$$(X (\tilde\beta-\hat{\beta})) \cdot (X (\tilde\beta-\hat{\beta})) = (\tilde\beta-\hat{\beta}) X^TX (\tilde\beta-\hat{\beta}) $$

You can see this graphically as some ellipsoid surface (as the image below which I copied from this question).

  • Limit behaviour of $ \frac{1}{n}\sum_{i}^{n}||y_{i} - \textbf{x}_{i}\tilde\beta||_{2}^{2} $

    This ellipsoid will depend on the particular sample (based on which $\hat\beta$ and $X^TX$ will vary) but for $n \to \infty$ you will get that the variation in this surface becomes smaller.

  • Limit behaviour of $ \frac{\lambda_{n}}{n}\sum_{j=1}^{p}|\tilde\beta_{j}|$

    If $\{\lambda_{n}\}$ is $o(n)$ then $\lbrace\frac{\lambda_{n}}{n}\rbrace$ is $o(1)$ and approaches zero.

So the first term in the cost function will approach some quadratic function of $\tilde\beta$ and the second term will approach zero. The lasso solution that minimises the sum of these terms will approach the true $\beta$ (The reasoning here is very intuitive but I am sure there is some reference for that).

The consequence is that $\{t_n\} - \sum_{j=1}^{p}|\beta_{j}| $ will approach zero (where $\beta_j$ refers to the true coefficients).

intuitive view of lasso path

Sextus Empiricus
  • 43,080
  • 1
  • 72
  • 161
  • Dear @Sextus Empiricus, I didn't understand the conclusion about $t_{n}$. Could you, please, clarify it? – ABK Jan 12 '21 at 13:31
  • @ABK I had a misunderstanding about your meaning of $\{\lambda_n\} \in o(n)$ and thought of $\lambda_n/n$ approaching a constant instead of zero. I have edited the answer now. – Sextus Empiricus Jan 12 '21 at 14:27
  • ok, again it is a bit confusing. If I understood correctly, the conclusion is that both $\{ \frac{\lambda_{n}}{n}\}$ and $\{t_{n} \}$ have the same rate of convergence – ABK Jan 12 '21 at 15:49
  • @ABK I am not sure about the *rate* of convergence of $t_n$, but if $n \to \infty$ and $\frac{\lambda_n}{n} \to 0$ then $t_n \to \sum_{j=1}^{p}|\beta_{j}|$. – Sextus Empiricus Jan 12 '21 at 16:21