Inference vs. estimation?

Question

What are the differences between "inference" and "estimation" under the context of machine learning?

As a newbie, I feel that we infer random variables and estimate the model parameters. Is my this understanding right?

If not, what are the differences exactly, and when should I use which?

Also, which one is the synonym of "learn"?

I found [this answer on Quora](http://www.quora.com/What-is-the-Difference-between-Inference-and-Model-Estimation-in-ML-papers), and am not sure about its correctness. — Sibbs Gambling, Jan 01 '15 at 08:24
Machine learning is just automated statistics (in my opinion), so I'm not sure the definitions would be different in statistics more generally — shadowtalker, Jan 01 '15 at 20:07
The canonical *statistical* literature makes a clear and consistent distinction between *inferring* properties of a presumed underlying model (in a decision-theoretic framework) and *predicting* values of random variables. *Estimation* is a special kind of inference. These can be contrasted with exploration and, to some extent, hypothesis testing. "Learn," as a transitive verb, does not have a standard statistical meaning. — whuber, Jan 14 '15 at 20:00
@StasK It would be--except it does not address the question, which asks about *machine learning* rather than statistics. I offered that comment in an effort to provide a little background from which to understand and evaluate the ML answers, especially since some of those answers seem to make nonstandard distinctions among inference, estimation, and prediction. — whuber, Jan 14 '15 at 20:11
@whuber, big deal... my answer does not answer everything, either. — StasK, Jan 15 '15 at 03:13
"Estimation" and "inference" is jargon from statistics, not machine learning. The equivalent of "estimation" in machine learning is called "training" or "learning" while "inference" as a concept doesn't exist at all. — Digio, Nov 30 '17 at 15:31

score 35 · Accepted Answer · edited Jan 15 '15 at 17:04

Statistical inference is made of the whole collection of conclusions one can draw from a given dataset and an associated hypothetical model, including the fit of the said model. To quote from Wikipedia,

Inference is the act or process of deriving logical conclusions from premises known or assumed to be true.

and,

Statistical inference uses mathematics to draw conclusions in the presence of uncertainty.

Estimation is but one aspect of inference where one substitutes unknown parameters (associated with the hypothetical model that generated the data) with optimal solutions based on the data (and possibly prior information about those parameters). It should always be associated with an evaluation of the uncertainty of the reported estimates, evaluation that is an integral part of inference.

Maximum likelihood is one instance of estimation, but it does not cover the whole of inference. On the opposite, Bayesian analysis offers a complete inference machine.

+1 especially for "It should always be associated with an evaluation of the uncertainty of the reported estimates" which is offen not done in machine learning and "data science". Simply benchmarking against a known data set is not that. — Momo, Jan 01 '15 at 10:17

score 6 · Answer 2 · answered Jan 14 '15 at 20:08

While estimation per se is aimed at coming up with values of the unknown parameters (e.g., coefficients in logistic regression, or in the separating hyperplane in support vector machines), statistical inference attempts to attach a measure of uncertainty and/or a probability statement to the values of parameters (standard errors and confidence intervals). If the model that the statistician assumes is approximately correct, then provided that the new incoming data continue to conform to that model, the uncertainty statements may have some truth in them, and provide a measure of how often you will be making mistakes in using the model to make your decisions.

The sources of the probability statements are twofold. Sometimes, one can assume an underlying probability distribution of whatever you are measuring, and with some mathematical witchcraft (multivariate integration of a Gaussian distribution, etc.), obtain the probability distribution of the result (the sample mean of the Gaussian data is itself Gaussian). Conjugate priors in Bayesian statistics fall into that witchcraft category. Other times, one has to rely on the asymptotic (large sample) results which state that in large enough sample, things are bound to behave in a certain way (the Central Limit Theorem: the sample mean of the data that are i.i.d. with mean $\mu$ and variance $\sigma^2$ is approximately Gaussian with mean $\mu$ and variance $\sigma^2/n$ regardless of the shape of the distribution of the original data).

The closest that machine learning gets to that is cross-validation when the sample is split into the training and the validation parts, with the latter effectively saying, "if the new data looks like the old data, but is entirely unrelated to the data that was used in setting up my model, then a realistic measure of the error rate is such and such". It is derived fully empirically by running the same model on the data, rather than trying to infer the properties of the model by making statistical assumptions and involving any mathematical results like the above CLT. Arguably, this is more honest, but as it uses less information, and hence requires larger sample sizes. Also, it implicitly assumes that the process does not change, and there is no structure in the data (like cluster or time-series correlations) that could creep in and break the very important assumption of independence between the training and the validation data.

While the phrase "inferring the posterior" may be making sense (I am not a Bayesian, I can't really tell what the accepted terminology is), I don't think there is much involved in making any assumptions in that inferential step. All of the Bayesian assumptions are (1) in the prior and (2) in the assumed model, and once they are set up, the posterior follows automatically (at least in theory via Bayes theorem; the practical steps may be helluvalot complicated, and Sipps Gambling... excuse me, Gibbs sampling may be a relatively easy component of getting to that posterior). If "inferring the posterior" refers to (1) + (2), then it is a flavor of statistical inference to me. If (1) and (2) are stated separately, and then "inferring the posterior" is something else, then I don't quite see what that something else might be on top of Bayes theorem.

means-to-meaning · Answer 3 · 2015-01-15T10:03:06.720

This is an attempt to give an answer for anyone without a background in statistics. For those who are interested in more details, there are many useful references (such as this one for example) on the subject.

Short answer:

Estimation $->$ find unknown values (estimates) for subject of interest

Statistical Inference $->$ use the probability distribution of subject of interest to make probabilistic conclusions

Long answer:

The term "estimation" is often used to describe the process of finding an estimate for an unknown value, while "inference" often refers to statistical inference, a process of discovering distributions (or characteristics) of random variables and using them to draw conclusions.

Think about answering the question of: How tall is the average person in my country?

If you decide to find an estimate, you could walk around for a couple of days and measure strangers you meet on the street (create a sample) and then calculate your estimate for example as the average of your sample. You have just done some estimation!

On the other hand, you could want to find more than some estimate, which you know is a single number and is bound to be wrong. You could aim to answer the question with a certain confidence, such as: I am 99% certain that average height of a person in my country is between 1.60m and 1.90m.

In order to make such a claim you would need to estimate the height distribution of the people you are meeting and make your conclusions based on this knowledge - which is the basis of statistical inference.

The crucial thing to keep in mind (as pointed out in Xi'an's answer) is that finding an estimator is part of statistical inference.

"How tall will the next random person" be is a question of statistical *prediction* rather than estimation. "What is the range of the middle 95% of all people" is an (interval) estimate. Although the two questions (and methods of solution) are closely related and sound similar, they are different in some important ways--and are answered differently, too. The difference arises from the randomness *of the next person* in the first question, which is not present in the second question. — whuber, Jan 14 '15 at 20:04
I agree that the examples are not ideal. Given the nature of the question, I was trying to give examples a non-statistician would be very familiar with. My most straightforward answer to "estimation" would be that it involves fitting the parameters of a statistical model, but then I would introduce the terms "fitting" and "statistical model" both of which would require an explanation. At the end of the day, while a prediction as described in the example is forward looking, I would still consider it a (point) estimate. — means-to-meaning, Jan 15 '15 at 09:36

score 3 · Answer 4 · answered Jan 13 '15 at 11:00

Suppose you have a representative sample of a population.

Inference is when you use that sample to estimate a model and state that the results can be extended to the entire population, with a certain accuracy. To make inference is to make assumptions on a population using only a representative sample.

Estimation is when you choose a model to fit your data sample and calculate with a certain precision that model's parameters. It is called estimation because you will never be able to calculate the true values of the parameters since you only have a data sample, and not the entire population.

score 1 · Answer 5 · answered Jan 13 '15 at 12:46

In the context of machine learning, inference refers to an act of discovering settings of latent (hidden) variables given your observations. This also includes determining the posterior distribution of your latent variables. Estimation seems to be associated with "point estimation", which is to determine your model parameters. Examples include maximum likelihood estimation. In expectation maximization (EM), in the E step, you do inference. In the M step, you do parameter estimation.

I think I hear people saying "infer the posterior distribution" more than "estimate the posterior distribution". The latter one is not used in the usual exact inference. It is used, for example, in expectation propagation or variational Bayes, where inferring an exact posterior is intractable and additional assumptions on the posterior have to be made. In this case, the inferred posterior is approximate. People may say "approximate the posterior" or "estimate the posterior".

All this is just my opinion. It is not a rule.

score 1 · Answer 6 · answered Dec 25 '16 at 12:54

Well, there are people from different disciplines today who make their career in the area of ML, and it's likely that they speak slightly different dialects.

However, whatever terms they might use, the concepts behind are distinct. So it's important to get these concepts clear, and then translate those dialects in the way that your prefer.

Eg.

In PRML by Bishop,

inference stage in which we use training data to learn a model for $p(C_k|x)$

So it seems that here Inference=Learning=Estimation

But in other material, inference may differ from estimation, where inference means prediction while estimation means the learning procedure of the parameters.

Sheridan Grant · Answer 7 · 2017-12-01T07:41:12.873

I want to add to others' answers by expanding on the "inference" part. In the context of machine learning, an interesting aspect of inference is estimating uncertainty. It's generally tricky with ML algorithms: how do you put a standard deviation on the classification label a neural net or decision tree spits out? In traditional statistics, distributional assumptions allow us to do math and figure out how to assess uncertainty in the parameters. In ML, there may be no parameters, no distributional assumptions, or neither.

There has been some progress made on these fronts, some of it very recent (more recent than the current answers). One option is, as others have mentioned, Bayesian analysis where your posterior gives you uncertainty estimates. Bootstrap type methods are nice. Stefan Wager and Susan Athey, at Stanford, have some work from the past couple years getting inference for random forests. Analagously, BART is a Bayesian tree ensemble method that yields a posterior from which inference can be drawn.

Inference vs. estimation?

7 Answers7

Linked