Probabilistic vs. other approaches to machine learning

Question

I'm taking a grad course on machine learning in the ECE department of my university. On the first lecture my professor seemed to make it a point to stress the fact that the course would be taking a probabilistic approach to machine learning. I didn't think much of it at the time, but now that I think back on it, what does this really mean? What other approaches are there to machine learning that I can contrast this against?

Expert systems and rule based systems used to be an alternative. Not anymore — Aksakal, Feb 07 '17 at 02:31
Or may be optimization perspective ? which emphasize less on probability and assumptions. — Haitao Du, Feb 07 '17 at 02:33
I've come to understand "probabilistic approach" to be more mathematical statistics intensive than code, say "here's the math behind these black box algorithms". — Jon, Feb 07 '17 at 02:56
Well, programming language shouldn't matter; but I'm assuming you're working through some math problems. Usually "probabilistic" is attached to the course title for non Statistics courses to get the point across. What you're covering in that course is material that is spread across many courses in a Statistics program. — Jon, Feb 07 '17 at 03:00
See also http://stats.stackexchange.com/questions/243746/what-is-probabilistic-inference/243759#243759 — Tim, Feb 07 '17 at 08:28

Haitao Du · Accepted Answer · 2017-02-07T02:49:07.343

6

The question may be too broad to answer. It is hard to guess another person's perspective. But I think the question is interesting, and I would like to try to answer.

The term "machine learning" can have many definitions. I believe The popular ones are

Convex optimization (there are tons of papers on NIPS for this topic)
"Statistics minus any checking of models and assumptions" by Brian D. Ripley

From optimization perspective, the ultimate goal is minimizing the "empirical loss" and try to win it on testing data set. Where we do not emphasize too much on the "statistical model" of the data. Some big black box discriminative model would be perfect examples, such as Gradient Boosting, Random Forest, and Neural Network. These types of work got popular because the way we collect data and process data has been changed. Where we can think we have infinite data and will never over-fit (for example number of images in Internet). All the computational model we can afford would under-fit super complicated data. The goal would be have an effective way to build the model faster and more complex (For example using GPU for deep learning)

On the other hand, from statistical points (probabilistic approach) of view, we may emphasize more on generative models. For example, mixture of Gaussian Model, Bayesian Network, etc. The book by Murphy "machine learning a probabilistic perspective" may give you a better idea on this branch.

edited Feb 07 '17 at 02:49

answered Feb 07 '17 at 02:41

Haitao Du

32,885
17
118
213

The first portion of your answers seems to allude that statisticians do not care about optimization, or minimizing loss. Is that the point you are making? Are RF, NN not statistical models as well that rely on probabilistic assumptions? – Jon Feb 07 '17 at 17:18
@Jon, I am not aware RF, NN assumptions.Could you tell me more? – Haitao Du Feb 07 '17 at 17:23
1

NNs and RF have been used for more than as black box machine learning tools. They've been developed using statistical theory for topics such as survival analysis. It can't be expected for me to provide you with a thorough answer on here but maybe this reference will help. https://people.orie.cornell.edu/davidr/or474/nn_sas.pdf – Jon Feb 07 '17 at 19:01
That said, I feel this answer is inaccurate. I actually stand by my comment, that "probabilistic" is added to the title for non-statisticians. For example, you'll see plenty of CS and ECE machine learning courses with "probabilistic approach" in the title, however, it will probably be rare (if at all) to see a ML course in a Statistics department with "probabilistic approach" attached to the title. – Jon Feb 07 '17 at 20:00
1. That's implementation, not theory. The algorithm comes before the implementation. This is not a chicken vs egg debate. 2. SVMs are statistical models as well. I'll let you Google that on your own. – Jon Feb 07 '17 at 20:20

score 5 · Answer 2 · answered Feb 07 '17 at 04:50

The term "probabilistic approach" means that the inference and reasoning taught in your class will be rooted in the mature field of probability theory. That term is often (but not always) synonymous with "Bayesian" approaches, so if you have had any exposure to Bayesian inference you should have no problems picking up on the probabilistic approach.

I don't have enough experience to say what other approaches to machine learning exist, but I can point you towards a couple of great refs for the probabilistic paradigm, one of which is a classic and the other will soon be, I think:

Jaynes, E.T. (2003) Probability Theory: The Logic of Science. Cambridge University Press, New York.
Murphy, K. (2012) Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge.

That's a weird coincidence, I just purchased and started reading both of those books. I guess I am sort of on the right track. — Austin, Feb 07 '17 at 04:52
Also see [this relevant answer](http://stats.stackexchange.com/questions/243746/what-is-probabilistic-inference/243759#243759), which points out that just using straight up optimization to solve a machine learning problem is _not_ a probabilistic approach, _sensu stricto_ — allen, Feb 08 '17 at 21:23

Probabilistic vs. other approaches to machine learning

2 Answers2