22

Machine learning research papers often treat learning and inference as two separate tasks, but it is not quite clear to me what the distinction is. In this book for example they use Bayesian statistics for both kinds of tasks, but do not provide a motivation for that distinction. I have several vague ideas what it could be about, but I would like to see a solid definition and perhaps also rebuttals or extensions of my ideas:

  • The difference between inferring the values of latent variables for a certain data point, and learning a suitable model for the data.
  • The difference between extracting variances (inference) and learning the invariances so as to be able to extract variances (by learning the dynamics of the input space/process/world).
  • The neuroscientific analogy might be short-term potentiation/depression (memory traces) vs long-term potentiation/depression.
Lenar Hoyt
  • 883
  • 1
  • 8
  • 15
  • 5
    Not sure whether this helps, but in statistics one distinction is between whether you want to think about learning as inference (mostly Bayes) or as estimation (mostly Frequentist). For the former, learning about everything - latent variables, parameters, predictions, models - is an inference (which returns a distribution). For the latter some learning problems may be an inference and others an estimation problem (which returns an estimate and sampling-theoretically motivated uncertainty range for it). – conjugateprior Apr 03 '16 at 17:16
  • 5
    "Learning" is just an evocative metaphor for the process of training a machine learning algorithm. I don't think there's much insight to be gained here. – Sycorax Apr 03 '16 at 17:40
  • I found interesting answers on Quora, but perhaps someone can elaborate a bit more: https://www.quora.com/What-is-the-difference-between-inference-and-learning – Lenar Hoyt Apr 03 '16 at 18:42
  • 2
    Possible duplicate of [The Two Cultures: statistics vs. machine learning?](http://stats.stackexchange.com/questions/6/the-two-cultures-statistics-vs-machine-learning) – Winks Apr 03 '16 at 19:05
  • 1
    @Winks Did you read the linked question *at all*? None of the answers makes the distinction I am asking for explicit. – Lenar Hoyt Apr 03 '16 at 19:20
  • 1
    @conjugateprior In machine learning, no one would say that "learning about everything - latent variables, parameters, predictions, models - is an inference". Learning and inference are considered totally separate even though they both can produce distributions. – Neil G Apr 03 '16 at 19:42
  • 1
    Learning is the process of gaining the ability to do correct inference... – user541686 Apr 03 '16 at 20:23
  • @NeilG ok, but I was explicitly sketching a statistics distinction not anything involving ML. That said, 'probabilistic inference for as much as possible, except when we have to approximate' seems to be the guiding framework for, e.g. Murphy 2012, and Bishop 2007, which are somewhat influential ML texts. – conjugateprior Apr 03 '16 at 22:03
  • @conjugateprior: You're right. It looks like Bishop's textbook uses in at least some places the terms "inference" to mean learning and "decision" to mean inference. The energy-based model framework (Le Cun, Bengio, et al.) is gaining a lot of traction in the "deep learning" community, which are the terms I prefer. Whichever terms you use, I do think you should keep a distinction between these two fundamentally different concepts. – Neil G Apr 03 '16 at 22:20
  • @NeilG No disagreement there: they are different concepts. On a previous comment: it was the 'no one would say' line above that caught my eye, since I've heard both Murphy and Bishop say it :-) – conjugateprior Apr 03 '16 at 22:47
  • @conjugateprior Yeah, thanks for the correction. I'll update my answer. – Neil G Apr 04 '16 at 04:04
  • @Sycorax, you should consider expanding your comment into an answer. – knrumsey Mar 18 '19 at 20:35
  • With respect to Machine Learning, Inference is "how do we obtain answers to relevant questions about the world". Learning is finding the model parameters that fits best to the data in hand. [Source from Stephano Ermon's CS-228 course](https://ermongroup.github.io/cs228-notes/preliminaries/introduction/#inference ) [Certain types of inferences, CS-228](https://ermongroup.github.io/cs228-notes/#inference) – barvin04 Feb 16 '21 at 14:22

4 Answers4

16

I agree with Neil G's answer, but perhaps this alternative phrasing also helps:

Consider the setting of a simple Gaussian mixture model. Here we can think of the model parameters as the set of Gaussian components of the mixture model (each of their means and variances, and each one's weight in the mixture).

Given a set of model parameters, inference is the problem of identifying which component was likely to have generated a single given example, usually in the form of a "responsibility" for each component. Here, the latent variables are just the single identifier for which component generated the given vector, and we are inferring which component that was likely to have been. (In this case, inference is simple, though in more complex models it becomes quite complicated.)

Learning is the process of, given a set of samples from the model, identifying the model parameters (or a distribution over model parameters) that best fit the data given: choosing the Gaussians' means, variances, and weightings.

The Expectation-Maximization learning algorithm can be thought of as performing inference for the training set, then learning the best parameters given that inference, then repeating. Inference is often used in the learning process in this way, but it is also of independent interest, e.g. to choose which component generated a given data point in a Gaussian mixture model, to decide on the most likely hidden state in a hidden Markov model, to impute missing values in a more general graphical model, ....

Danica
  • 21,852
  • 1
  • 59
  • 115
  • 2
    And a small caveat that one can choose to break things down into learning and inference this way, but one can *also* choose do the whole lot as inference: https://stats.stackexchange.com/questions/180582/when-is-a-naive-bayes-model-not-bayesian/180643#180643 – conjugateprior Apr 03 '16 at 22:15
11

Inference is choosing a configuration based on a single input. Learning is choosing parameters based on some training examples.

In the energy-based model framework (a way of looking at nearly all machine learning architectures), inference chooses a configuration to minimize an energy function while holding the parameters fixed; learning chooses the parameters to minimize the loss function.

As conjugateprior points out, other people use different terminology for the same thing. For example Bishop, uses "inference" and "decision" to mean learning and inference respectively. Causal inference means learning. But whichever terms you decide on, these two concepts are distinct.

The neurological analogy is a pattern of firing neurons is a configuration; a set of link strengths are the parameters.

Neil G
  • 13,633
  • 3
  • 41
  • 84
  • @mcb I still don't know what you mean by "variances". "Invariances" isn't even a word in the dictionary. Yes, there are many learning algorithms that rely on an inferred configuration like EM described in Dougal's answer. – Neil G Apr 03 '16 at 19:36
  • @mcb I don't understand your questions either; perhaps it would help to specify an example model and be specific about what distribution / variances / invariants(?) you're talking about. – Danica Apr 03 '16 at 20:03
  • Thanks for Your answers. Perhaps I have misunderstood something. – Lenar Hoyt Apr 03 '16 at 21:20
  • @NeilG I believe this terminology is mostly used in ML vision work where classification decisions should be 'invariant' to object translation, rotation, rescaling etc. Can't find a good short reference, but there's this: https://en.wikipedia.org/wiki/Prior_knowledge_for_pattern_recognition – conjugateprior Apr 03 '16 at 22:20
  • @conjugateprior I had a feeling that's what he was getting at, but I wanted to see if he would make his question clear. – Neil G Apr 03 '16 at 22:22
  • @nbro a configuration is whatever is inferred. – Neil G Jan 19 '19 at 16:32
  • @nbro of course you can. A latent variable is inferred based on one training example. This is not strange. This is the standard terminology in machine learning. – Neil G Jan 19 '19 at 16:40
  • @nbro Yeah, in a naive Bayes classifier, the class is inferred for every observation. Also, you can read about this terminology in LeCun 2006. – Neil G Jan 19 '19 at 17:07
4

This looks like classic cross-discipline lingo confusion. The OP seems to be using neuroscience-like terminology where the two terms in question may have different connotations. But since Cross Validated generally deals with statistics and maching learning, I'll try answering the question based on the common usage of these terms in those fields.

In classical statistics, inference is simply the act of taking what you know about a sample and making a mathematical statement about the population from which it is (hopefully) representative. From the canonical textbook of Casella & Berger (2002): "The subject of probability theory is the foundation upon which all of statistics is built ... through these models, statisticians are able to draw inferences about populations, inferences based on examination of only a part of the whole". So in statistics, inference is specifically related to p-values, test statistics, and sampling distributions, etc.

As for learning, I think this table from Wasserman's All of Statistics (2003) might be helpful:

enter image description here

Zoë Clark
  • 764
  • 3
  • 11
  • This disagrees with a lot of other textbooks including Bishop's book mentioned in the comments. Classification is a kind of supervised learning when the target variables are categories. The word "estimation" alone is vague: usually we mean "density estimation" or "parameter estimation" or "sequential estimation" or "maximum likelihood estimation". – Neil G Apr 03 '16 at 22:11
  • 1
    Also, Bayes net is not just a directed acyclic graph! It is a kind of dag whose nodes represent propositions and whose edges represent probabilistic dependencies. It specifies conditional independence relationships. – Neil G Apr 03 '16 at 22:15
  • 1
    @NeilG Quite so. The closest statistics translation would probably be "structural equation model" – conjugateprior Apr 03 '16 at 22:50
  • 2
    And in a dismaying amount of statistics there should be two lines about data: CS: training data, Statistics: data. CS: test data, Statistics: wut? – conjugateprior Apr 03 '16 at 22:52
  • Stat 101: wut = another (hopefully random) sample from your population ... – Zoë Clark Apr 04 '16 at 01:36
  • I was only reflecting on my (informal and unsystematic) survey of Stat 101 syllabuses wherein prediction topics, e.g. predictive intervals, forecasting, overfitting, and even in-sample measures of its expected magnitude like AIC, are completely lacking. – conjugateprior Apr 04 '16 at 13:13
-1

It is strange no one else mentioned this, but you can have inference only in cases where you have a probability distribution. Here to quote Wiki, which quotes Oxford dictionary:

Statistical inference is the process of using data analysis to deduce properties of an underlying probability distribution (Oxford Dictionary of Statistics)

https://en.wikipedia.org/wiki/Statistical_inference

In case of traditional neural networks, k-NN or vanilla SVMs you have no probability density to estimate, nor assumptions about any density, thus, no statistical inference there. Only training/learning. However, for most (all?) statistical procedures, you can use both inference AND learning, since these procedures possess some assumptions about the distribution of population in question.

SWIM S.
  • 1,006
  • 9
  • 17
  • This is wrong. Anyway, you can interpret neural networks as producing a distribution if you want. See, e.g. Amari 1998. – Neil G Jan 19 '19 at 16:44
  • It is not wrong, or specify. You CAN interpret, but originally there is no such interpretation. – SWIM S. Jan 19 '19 at 16:52
  • It is wrong because people use the term inference with models like autoenciders. – Neil G Jan 19 '19 at 16:58
  • So, is it wrong because some group of people uses the term incorrectly? Or because they have some probabilistic interpretation for their NNs (i'm not deeply familiar with autoencoders)? I logically justified why one term is different from the other. So, given the definition above, i see that those who use the term inference with NNs, k-NNs, or SVMs (unless with probabilistic interpretation) are pretty much abusing the notation. – SWIM S. Jan 19 '19 at 17:05