15

The Gauss-Markov theorem tells us that the OLS estimator is the best linear unbiased estimator for the linear regression model.

But suppose I don't care about linearity and unbiasedness. Then is there some other (possible nonlinear/biased) estimator for the linear regression model which is the most efficient under the Gauss-Markov assumptions or some other general set of assumptions?

There is of course one standard result: OLS itself is the best unbiased estimator if in addition to the Gauss-Markov assumptions we also assume that the errors are normally distributed. For some other particular distribution of errors I could compute the corresponding maximum-likelihood estimator.

But I was wondering if there is some estimator which is better-than-OLS in some relatively general set of circumstances?

onestop
  • 16,816
  • 2
  • 53
  • 83

3 Answers3

19

Unbiased estimates are typical in introductory statistics courses because they are: 1) classic, 2) easy to analyze mathematically. The Cramer-Rao lower bound is one of the main tools for 2). Away from unbiased estimates there is possible improvement. The bias-variance trade off is an important concept in statistics for understanding how biased estimates can be better than unbiased estimates.

Unfortunately, biased estimators are typically harder to analyze. In regression, much of the research in the past 40 years has been about biased estimation. This began with ridge regression (Hoerl and Kennard, 1970). See Frank and Friedman (1996) and Burr and Fry (2005) for some review and insights.

The bias-variance tradeoff becomes more important in high-dimensions, where the number of variables is large. Charles Stein surprised everyone when he proved that in the Normal means problem the sample mean is no longer admissible if $p \geq 3$ (see Stein, 1956). The James-Stein estimator (James and Stein 1961) was the first example of an estimator that dominates the sample mean. However, it is also inadmissible.

An important part of the bias-variance problem is determining how bias should be traded off. There is no single “best” estimator. Sparsity has been an important part of research in the past decade. See Hesterberg et al. (2008) for a partial review.

Most of the estimators referenced above are non-linear in $Y$. Even ridge regression is non-linear once the data is used to determine the ridge parameter.

vqv
  • 2,404
  • 1
  • 23
  • 16
  • 1
    @chl seconded. Great overview. – mpiktas Mar 01 '11 at 11:34
  • 1
    One of my favourite admissible estimators: a single arbitrarily chosen point of the parameter space which is not an impossible value :) – probabilityislogic May 20 '11 at 14:03
  • @probabilityislogic Those often are admissible, but what you wrote is [not 100% true if we (somewhat) artificially restrict the parameter space](https://stats.stackexchange.com/q/529917/247274) (e.g. we only allow for heads-heads or tails-tails coins). That a possible constant is not always admissible surprised me. – Dave Jun 29 '21 at 21:30
9

I don't know if you are OK with the Bayes Estimate? If yes, then depending on the Loss function you can obtain different Bayes Estimates. A theorem by Blackwell states that Bayes Estimates are never unbiased. A decision theoretic argument states that every admissible rule ((i.e. or every other rule against which it is compared, there is a value of the parameter for which the the risk of the present rule is (strictly) less than that of rule against which it's being compared)) is a (generalized) Bayes rule.

James-Stein Estimators are another class of estimators (which can be derived by Bayesian methods asymptotically) which are better than OLS in many cases.

OLS can be inadmissible in many situations and James-Stein Estimator is an example. (also called Stein's paradox).

suncoolsu
  • 6,202
  • 30
  • 46
  • Thanks for the pointers. Will need to hit the library to make sense of it all. – Jyotirmoy Bhattacharya Sep 23 '10 at 14:53
  • 1
    @suncoolsu, that's not the typical definition of admissibility. The one you've given is (much) stronger. An admissible estimator is one that is *not* uniformly dominated, i.e., for every other rule against which it is compared, there is a value of the parameter for which the the risk of the present rule is (strictly) less than that of rule against which it's being compared. Conversely, an *inadmissible* estimator is one that is (weakly) dominated by *some* other estimator for *every* value of the parameter and is strictly dominated for *at least* one value by that same estimator. – cardinal Feb 27 '11 at 15:24
  • @cardinal Yup. You are right. I will correct it. – suncoolsu Feb 28 '11 at 00:11
  • @cardinal. Using math is much easier than simplifying it in plain English. But that is just me. Thanks for the correction @cardinal – suncoolsu Feb 28 '11 at 00:17
  • @suncoolsu what do you mean by “never unbiased”? Do you mean with respect to the prior? – vqv Feb 28 '11 at 00:46
  • @vqv There is a theorem by Blackwell which says that "Bayes estimates are almost never unbiased". Please notice I am talking about asymptotic unbiasedness here. – suncoolsu Feb 28 '11 at 03:31
  • 1
    @suncoolsu asymptotic unbiasedness is very different from the usual sense of "unbiased". Any reasonable estimate should be asymptotically unbiased. One more note: shouldn't the statement about admissible estimators be the other way around? ie every admissible estimator is generalized Bayes. – vqv Feb 28 '11 at 04:15
  • @suncoolsu, this is definitely one of those instances where a few symbols say something much more clearly than "plain English". I fear my description is too clumsy, but it's hard to get it precise in few words. Another attempt would be: "An admissible estimator dominates every other estimator at at least one point in the parameter space. The point of domination will typically vary according to the estimator being compared against." – cardinal Feb 28 '11 at 04:15
  • @vqv In re: "Any reasonable estimate should be asymptotically unbiased." I don't necessarily disagree, but this is a strong statement, no? I haven't got an example offhand but it seems plausible that there exist estimators with nice finite sample properties that happen to be inconsistent. But perhaps not. And I think you're correct that every admissible estimator corresponds to a Bayes rule. – JMS Feb 28 '11 at 16:22
  • @vqv - You are correct as well. Sorry for misinterpreting your question. – suncoolsu Feb 28 '11 at 16:23
  • @JMS. I think @vqv is correct, this is another way of thinking about consistency. – suncoolsu Feb 28 '11 at 16:25
  • @suncoolsu I wasn't so clear; I should have stuck with asymptotically biased (consistent under L1). Still, I'm not so sure that there do not exist reasonable estimators that are asymptotically biased but have good finite sample properties. But like I said, I can't think of an example offhand. – JMS Feb 28 '11 at 16:31
  • @suncoolsu, consistency and asymptotic unbiasedness are different things. To get some notion of equivalence requires more structure on the problem, like uniform integrability. There are some estimators that are consistent yet asymptotically biased and, of course, estimators that are asymptotically unbiased but inconsistent. – cardinal Feb 28 '11 at 17:38
  • @JMS Asymptotic unbiasedness does not imply consistency. Asymptotic unbiasedness: $|E \hat{\theta} - \theta| \to 0$; L1 consistency: $E|\hat{\theta} - \theta| \to 0$. L1 consistency is much stronger. Anyway, this is probably far off the topic of the original question. – vqv Feb 28 '11 at 17:49
  • Of course, where is my head. Good catch. – JMS Feb 28 '11 at 20:23
  • @everyone. Sorry. I meant - "a step towards consistency". Because consistency is a sum of squared bias and variance in $L_2$ specifically. – suncoolsu Feb 28 '11 at 20:33
  • @suncoolsu, you must be using a definition of consistency that is different. The usual definition is a statement about *convergence in probability*, not in any $L^p$ space. Of course, if you have the latter, you get the former for free. – cardinal Feb 28 '11 at 21:08
  • @cardinal. I can't resist my temptation to use math anymore. $E[\hat \theta - \theta]^2 = E[\hat \theta - E[\hat \theta] + E[\hat \theta] - \theta]^2 = E[\hat \theta - E[\hat \theta]]^2 + ( E[\hat \theta] - \theta)^2 = Var[\hat \theta] + Bias^2$. So basically @cardinal and me are talking about the same thing but from a different perspective. If Variance **and** Bias go to 0 in **probability**, consistency holds, otherwise not. Or am I still incorrect? Thats what I have learnt in my courses here. – suncoolsu Mar 01 '11 at 02:37
  • @suncoolsu, Let $\hat{\beta}_n$ be an estimator for a parameter, $\beta$, derived from a sample of size $n$. Then $\hat{\beta}_n$ is *consistent* for $\beta$ if $\hat{\beta}_n \to \beta$ ***in probability***. This is a weaker requirement than requiring that $\mathbb{E} |\hat{\beta}_n - \beta|^p \to 0$ for some $p > 0$. – cardinal Mar 01 '11 at 02:44
  • @suncoolsu, I see you edited your comment while I was writing mine. Convergence in *any* $L^p$ space guarantees convergence in probability. This can be seen by a straightforward application of Markov's inequality. But, convergence in probability is weaker than this requirement. Hence, consistency of a parameter is (substantially) weaker than requiring the parameter to converge to its true value in, say, $L_2$, the latter being more or less what it looks like you're stating. – cardinal Mar 01 '11 at 02:48
  • @cardinal. Yeah I was talking about MSE and not the convergence. – suncoolsu Mar 01 '11 at 03:02
  • @suncoolsu, Some people refer to convergence of a parameter in $L_2$ as *mean-square consistency*. But, in general, a consistent estimator need not have asymptotically negligible bias *nor* variance. In fact, it need not even have finite variance. If you're really sadistic, you can undoubtedly cook up an example where the variance *isn't even defined*. – cardinal Mar 01 '11 at 03:13
  • @cardinal. Thanks for the inputs :-) Helps improve my understanding! – suncoolsu Mar 01 '11 at 03:17
  • @suncoolsu, mine, too. That's one of the reasons I enjoy this site. Cheers. – cardinal Mar 01 '11 at 03:23
5

There is a nice review paper by Kay and Eldar on biased estimation for the purpose of finding estimators with minimum mean square error.

Robby McKilliam
  • 2,010
  • 14
  • 14