30
  1. What is the oracle property of an estimator?
  2. What modelling goals is the oracle property relevant for (predictive, explanatory, ...)?

Both theoretically rigorous and (especially) intuitive explanations are welcome.

Richard Hardy
  • 54,375
  • 10
  • 95
  • 219
  • 1
    It would be nice to have a solid one-stop-shop answer for the question. Some related material: Zou ["The Adaptive LASSO and its oracle properties"](http://www.math.yorku.ca/~hkj/Teaching/6621Winter2015/Coverage/adalasso.pdf), p. 1 (pp. 1418). – Richard Hardy Aug 10 '16 at 09:11

2 Answers2

14

An oracle knows the truth: it knows the true subset and is willing to act on it. The oracle property is that the asymptotic distribution of the estimator is the same as the asymptotic distribution of the MLE on only the true support. That is, the estimator adapts to knowing the true support without paying a price (in terms of the asymptotic distribution.)

By the asymptotic optimality properties of the MLE discussed in, for instance, Keener's theoretical statistics in theorem 9.14, we know, under some technical conditions which hold when, for instance, the error is Gaussian, that $$\sqrt{n} \left( \hat\beta_S - \beta^*_S \right) \to \mathcal{N} (0, I^{-1}(\beta^*_S)),$$ where we assume that $\beta^*_S$ is the true coefficient on the true support $S$. Notice that the variance of the asymptotic distribution is the inverse of the Fisher information, showing that $\hat\beta_S$ is asymptotically efficient. Since the MLE knowing the true support achieves this, it is also required as part of the oracle property.

However, we do pay a steep nonasymptotic price: see, for instance,

Hannes Leeb, Benedikt M. Pötscher, Sparse estimators and the oracle property, or the return of Hodges’ estimator, Journal of Econometrics, Volume 142, Issue 1, 2008, Pages 201-211,

which shows that the risk of any "oracle estimator" (in the sense of Fan and Li, 2001) has a supremum which diverges to infinity.

user795305
  • 2,692
  • 1
  • 20
  • 40
  • -so the oracle property for the lasso states the followng:the oracle property is that the asymptotic distribution of the estimator is the same as the asymptotic distribution of the LASSO logistic regression on only the true support – Annalise Azzopardi Sep 21 '19 at 15:15
9

The definition of Oracle property is related highly to the context. The very short but precise answer in linear regression (precisely high dimensional one) is this:

an oracle estimator must be consistent in parameter estimation and variable selection.

Notice that an estimator that is consistent in variable selection is not necessarily consistent in parameter estimation. See adaptive lasso paper for mathematical definitions or simply see this slides.

TPArrow
  • 2,155
  • 11
  • 22
  • 1
    In the adaLASSO paper (linked in my comment) they say the convergence rate has to be optimal, too (extra to consistent estimation). That is an important and a bit difficult concept. Could you elaborate on that? – Richard Hardy Aug 10 '16 at 11:30
  • Convergence rate is a context-related assumption. In lasso it is $\sqrt{n}$ for $n$ the number of observations. However, consistency is an asymptotic result in lasso. – TPArrow Aug 10 '16 at 13:09
  • So would you suggest to remove the requirement of the rate being optimal in the definition of oracle property? – Richard Hardy Aug 10 '16 at 13:14
  • In the general definitions, I see no obligation to mention the speed. But in theory, we need to know/determine the optimal speed, obviously. – TPArrow Aug 10 '16 at 13:17
  • Thanks. I am picking on this because we talk about a definition here, so I am trying to be precise. – Richard Hardy Aug 10 '16 at 13:31
  • @RichardHardy, I see no more discussion on this topic, meaning there is a consensus on it. if the reply answers your question, please vote it up/down or close the question. – TPArrow Aug 23 '16 at 08:17
  • Thank you for the suggestion. 13 days is certainly not much for a question. It has only been viewed 44 times. Lack of discussion combined with lack of views and lack of votes on the answer does not imply a consensus. It only shows the current answer is not particularly satisactory. I am still waiting for alternative answers. I believe a better answer can be delivered; actually, I could probably do it myself if I had more time, since I am already somewhat familiar with the literature. (I am not a keen downvoter, so I will not do that either.) – Richard Hardy Aug 23 '16 at 15:07
  • I have read a bit more about oracle estimators, and it seems that the definition indeed depends quite a lot on the context. Thus +1 for now. – Richard Hardy Mar 30 '17 at 13:33
  • @RichardHardy what is the link between algorithm stability and consistency ? see here https://stats.stackexchange.com/questions/365938/what-causes-lasso-to-be-unstable-for-feature-selection/366419#366419 – Xavier Bourret Sicotte Sep 12 '18 at 12:41