Why do we not care about completeness, sufficiency of an estimator as much anymore?

Question

When we begin to learn Statistics, we learn about seemingly important class of estimators that satisfy the properties sufficiency and completeness. However, when I read recent articles in Statistics I can hardly find any papers that address complete sufficient statistics. Why would we not care about completeness, sufficiency of an estimator as much anymore?

Statistically speaking, how do you quantify the "as much anymore" part in your question? To what degree of certainty do you know that your observation is sufficiently objective? — luis.espinal, Feb 27 '20 at 17:27

DanielTheRocketMan · Accepted Answer · 2020-02-27T11:14:37.953

We still care. However, a large part of statistics is now based on a data-driven approach where these concepts may not be essential or there are many other important concepts.

With computation power and lots of data, a large body of statistics is devoted to provide models that solve specific problems (such as forecasting or classification) that can be tested using the given data and cross-validation strategies. So, in these applications, the most important characteristics of models are that they have a good fit to the data and claimed ability to forecast out of sample.

Furthermore, some years ago, we were very interested in unbiased estimators. We still are. However, in that time, in rare situations one could consider to use an estimator that is not unbiased. In situations where we are interested in out of sample forecasts, we may accept an estimator that is clearly biased (such as Ridge Regression, LASSO and Elastic Net) if they are able to reduce the out of sample forecast error. Using these estimators actually we “pay” with bias to reduce the variance of the error or the possibility of overfitting.

This new focus of the literature has also brought new concepts such as sparsistency. In statistical learning theory we study lots of bounds to understand the ability of the generalization of a model (this is crucial). See for instance the beautiful book "Learning From Data" by Abu-Mostafa et al.

Related fields such as econometrics have also been suffering the impact of these changes. Since this field is strongly based on statistical inference and it is fundamental to work with unbiased estimators associated with models that come from the theory, the changes are slower. However, several attempts have been introduced and machine learning (statistical learning) is becoming essential to deal for instance high dimensional databases.

Why is that?

Because economists, in several situations, are interested in the coefficients and not in the predictable variable. For instance, imagine a work that tries to explain corruption-level using a regression model such as: $$\text{corruptionLevel} = \beta_0 + \beta_1 \text{yearsInPrison} + \beta_2 \text{numberConvicted} + \cdots$$

Note that the coefficients $\beta_1$ and $\beta_2$ provide information to guide the public policy. Depending on the values of the coefficients, different public policies will be carried out. So, they cannot be biased.

If the idea is that we should trust in the coefficients of the econometric regression model and we are working with high dimensional databases, maybe we may accept to pay with some bias to receive in return lower variance: “Bias-variance tradeoff holds not only for forecasts (which in the case of a linear model are simply linear combinatons of the estimated coefficients) but also for individual coefficients. One can estimate individual coefficients more accurately (in terms of expected squared error) by introducing bias so as to cut variance. So in that sense biased estimators can be desirable. Remember: we aim at finding the true value. Unbiasedness does not help if variance is large and our estimates lie far away from the true value on average across repeated samples.” - @Richard_Hardy

This idea has motivated researchers to look for solutions that sound good for economists as well. Recent literature has approached this problem by choosing focus variables that are not penalized. These focus variables are the ones that are important to guide public policy. In order to avoid the omitted variables bias, they also run a regression of this focus variables on all the other independent variables using a shrinking procedure (such as Lasso). The ones with coefficients different from zero are also included in the regression model as well. They ensure that asymptotics of this procedure is good. See here a paper of one of the leader of the field. See for instance this overview by leaders of the field.

I think this captures the broad situation quite well. However, many fields including econometrics make much use of maximum likelihood and tolerate small biases. (This is a bit more gung-ho for machine learning than many readers would be, but let opinions be opinions.) — Nick Cox, Feb 25 '20 at 09:59
Yes, You are right. The "classical" focus of econometrics is on Statistical Inference. Why is that? Because economists want to test their theories instead of building theories based on data or to make out of sample forecast. This is a problem because when they write the paper they say that economic theory was used to choose the variables that they included in the model. However, in practice, they run a bizarre procedure of remotion of "irrelevant" variables based for instance in the t tests, which is not valid in inference statistics. — DanielTheRocketMan, Feb 25 '20 at 12:34
Furthermore, a difficult that reminds in the field is that economists (many) are not versed in computational models that are needed to deal with machine learning/statistical learning theory. So, the changes are slower, people still use very small databases… Anyway, some people led by researchers of MIT and Stanford university are trying to improve the field. Examples of these people are Hal Varian who is the chief economist at google, Susan Athey (long term consultant of microsoft) and her husband Guido Imbens. — DanielTheRocketMan, Feb 25 '20 at 12:34
@DanielTheRocketMan, you say *it is fundamental to work with unbiased estimators* in econometrics. Could you offer some explanation on how / why / what is the technical reason? — Richard Hardy, Feb 25 '20 at 14:21
@RichardHardy. Sure. The point is that, in several situations, economists are interested in the coeffcients and not in the forecasts. For intance, imagine a work that tries to explain corruptionLevel using a regression model: corruptionLevel = beta0 + beta_1 yearsInPrison + beta_2 numberConvicted + .... Note that the coefficients provide information to guide the public policy. They cannot be biased. For instance, in this regression, it is more important to focus on the increase of the conviction rate or to increase the years in prison for people that are convicted? — DanielTheRocketMan, Feb 25 '20 at 14:54
This argument has not convinced me so far. Bias-variance tradeoff holds not only for forecasts (which in the case of a linear model are simply linear combinatons of the estimated coefficients) but also for individual coefficients. One can estimate individual coefficients more accurately (in terms of expected squared error) by introducing bias so as to cut variance. So in that sense biased estimators can be desirable. Remember: we aim at finding the true value. Unbiasedness does not help if variance is large and our estimates lie far away from the true value on average across repeated samples. — Richard Hardy, Feb 25 '20 at 15:01
I agree with @RichardHardy here. Much of statistics boils down to a trade-off between bias and variance and a small bias is often both inevitable and acceptable. — Nick Cox, Feb 25 '20 at 15:07
You @RichardHardy have a great argument and this exactly the argument that is used to improve the estimations in economics when people use high dimensional data. But the things are done with some differences. For instance, the shrinking procedure is not applied to all variables in the regression model. The relevant ones (the ones that are used to make public policy) are not pennalized and the other ones follow the usual lasso procedure to select variables. This is very recent literature! — DanielTheRocketMan, Feb 25 '20 at 15:08
I am aware of this approach. Assuming some of the shrunken variables are actually relevant and correlated with the focus variables, the focus variables will be biased even if they are not shrunken, so there you go. Also, Bayesian econometrics is being successfully used in heavly parameterized macroeconomic models with small samples; frequentist estimators of these models are less biased but have such high variance that are useless in practice (and therefore not used). But if you agree with my argument, then you may wish to reconsider the bit where you write *fundamental* in your answer. — Richard Hardy, Feb 25 '20 at 15:49
You are right. The ommited bias exists in some extent. I do not recall (since I do not work with this literature), but I remember that they have a procedure to try to avoid this and they show mathematically that assymptotically the things work well. But I agree. In the end, they are accepting some bias. — DanielTheRocketMan, Feb 25 '20 at 16:05
Your paragraph 2 sounds somewhat like climate modelling - except maybe replace "ability to predict" with "claimed ability to predict" ? :-) :-( — Russell McMahon, Feb 27 '20 at 07:09

Glen_b · Answer 2 · 2020-02-26T05:42:05.600

We do care but usually either the issue is taken care of, or we're not making a specific distributional assumption with which we could apply those considerations.

Many of the usual estimators for commonly used parametric models are either fully efficient under the usual distributional assumptions for that model or asymptotically efficient under those model assumptions. Unless we're dealing with fairly small sample sizes, there's nothing to do.

Consider generalized linear models as an obvious example.
We often don't have a fully explicit parametric distributional model. We might use a robust procedure, or we might be looking at some convenient estimator along with a bootstrap for dealing with bias and estimating standard error.

Without an explicit distribution to even start looking at sufficiency or completeness for, there's nothing to do.

(Consider that there may be little point in finding an efficient estimator for a model you're sure will be wrong... what might make more sense would be finding one that does reasonably well in some kind of neighborhood of an approximate model. A good part of the theory for robustness takes a particular sense of the word "neighborhood" when considering a question like this.)

In the comments below Nick Cox points out that "deviations from the ideal -- are often perfectly tolerable"; this is certainly the case. Box wrote "Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful." To me this is a pretty central issue, but I'd add "and in what particular ways" after "how wrong".

It's important to understand the behavior of the tools we use away from the situation they're best at; when do they perform quite well, when do they perform badly (and hopefully what else might do at least as well in a similar range of circumstances).

We need to keep in mind that statistical tools like tests, estimates and intervals all have several senses in which we expect them to 'perform' (e.g. significance level and power, bias and variance, interval width and coverage); for example, there's often a tendency to focus very hard on significance level on tests without paying attention to power.

These issues are less clean-cut than looking at completeness or sufficiency, and we don't have a nice array of "neat" theorems to use. In many cases we may need to use coarser but simpler tools - like simulation - to get much of a sense of what may happen. [In some situations it helps to understand something of the tools of robustness to have clues about what things it might make sense to simulate. It's good to have a sense of what it takes to make something go completely off the rails. I've seen people report that a test has "good robustness to skewness" while simulating nothing more extreme than an exponential distribution, for example, and only examining type I error rate.]

(+1) It's a one-person campaign, just about, but I think that much of the literature, and much discussion, would be easier and more effective if we could talk and write about _ideal conditions_ for procedures rather than _assumptions_. In logic and pure mathematics, truth of conclusions depends on truth of assumptions, but in marriage and statistics, imperfections -- deviations from the ideal -- are often perfectly tolerable. — Nick Cox, Feb 25 '20 at 14:10
(Sure, in logic and pure mathematics, you can get a correct conclusion by accident.) — Nick Cox, Feb 25 '20 at 14:15
@NickCox: I think focusing on tolerance of imperfections in statistical models and marriage might actually be a two-person campaign --- you, and my wife. — Ben, Feb 25 '20 at 23:23
A very good point that was made for the first time, as far as I'm aware, by John W. Tukey in his 1962 paper "The Future of Data Analysis", https://projecteuclid.org/download/pdf_1/euclid.aoms/1177704711 - particularly Sec. I.5 Dangers of optimization. — Christian Hennig, Feb 26 '20 at 00:04

Why do we not care about completeness, sufficiency of an estimator as much anymore?

2 Answers2

Linked