3

Let us consider the following lasso estimator: $$ \hat{\beta}_{L} = \arg\min \, \frac{1}{n}\sum_{i}^{n}||y_{i} - \textbf{x}_{i}\beta||_{2}^{2} + \frac{\lambda_{n}}{n}\sum_{j=1}^{p}|\beta_{j}| $$ and assume that $p$, the dimension of parameter, is fixed, smaller than $n$ and the design matrix $\textbf{X}$ is not singular and $$ \frac{1}{n}\sum_{i=1}^{n}\textbf{x}_{i}\textbf{x}_{i}^{T} \to C, $$ where $C$ is positive definite matrix and $$ \frac{1}{n} \max_{1\leq i \leq n}\textbf{x}_{i}^{T} \textbf{x}_{i} \to 0. $$ (note: the conditions on the design matrix are taken from famous Fu & Knight paper "Asymptotics for lasso-type estimators" (2000)).

Next, assume that for each $n$ we choose $\lambda_{n}$ by leave-one-out cross validation for $l_{2}$-norm.

What can we say about the sequence $\lambda_{n}$? Is it faster than $O(\sqrt{n})$?

Please, feel free to choose the design matrix $\textbf{X}$ as simple as possible, i.e. the diagonal matrix for repeated balances measurements.

Richard Hardy
  • 54,375
  • 10
  • 95
  • 219
ABK
  • 396
  • 2
  • 17
  • You can't say anything about the sequence until you indicate exactly how $\mathbf X$ varies with $n.$ – whuber Feb 19 '20 at 14:36
  • 1
    Dear @whuber, I have edited the question. – ABK Feb 19 '20 at 15:08
  • Thank you--that is much clearer. But in light of those particular asymptotics, wouldn't you expect $\lambda_n$ to *decrease* with $n$? – whuber Feb 19 '20 at 16:24
  • Dear @whuber, actually, I am interested if the convergence is faster than $O(\sqrt{n})$, which would be enough for $\sqrt{n}$ consistency. – ABK Feb 19 '20 at 16:44
  • I believe you mean $O(1/\sqrt{n}),$ then. $O(\sqrt{n})$ would *diverge.* – whuber Feb 19 '20 at 17:31
  • Dear @whuber, I am confused now. For consistency it is enough that $\frac{\lambda_{n}}{\sqrt{n}} \to c \geq 0$, according to the paper I am referencing to. – ABK Feb 19 '20 at 18:04
  • I don't see how that is possible, because diverging $\lambda_n$ would drive $\hat\beta$ to zero. – whuber Feb 19 '20 at 18:38
  • 1
    Dear @whuber, I apologise. I forgot to devide $\lambda_{n}$ by $n$ in the criterion function. Thank you! – ABK Feb 19 '20 at 18:46
  • 2
    +1. But you could dispense with the $1/n$ factors altogether if you like, because they will not change the estimates. – whuber Feb 19 '20 at 18:55
  • Dear @whuber, yes, of course, it is done in order to have criterion function bounded. – ABK Feb 19 '20 at 20:47
  • 1
    @ABK , I see the bounty has expired and you got no answer. I deleted my answer because it's a bit more difficult than a simple shrinking. The factor by which the shrinking occurs is a constant shift, which is independent from the scale of the $\beta_{OLS}$ (in my answer I assumed a scaling instead of a shift). The leave one out cross validation will get to a non-zero $\lambda$ when there is a sign change in a coefficient for some of the folds (which may occur with some probability if the true $\beta$ is zero). I hope I will find time to work this out in an answer. – Sextus Empiricus Feb 13 '21 at 09:33

0 Answers0