Probabilistic (Bayesian) vs Optimisation (Frequentist) methods in Machine Learning

Question

Possible Duplicate:
Bayesian and frequentist reasoning in plain English

A very similar question was posed on stats.SE: Bayesian and frequentist reasoning in plain English, which provoked some interesting debate. However the debate is very focussed on how a statistician would behave, and doesn't really answer the question from a Machine Learning perspective.

The question is in 3 parts:

In simple terms, what is the difference between the Probabilistic (Bayesian) and the Optimisation (Frequentist) approach to machine learning?
What are they key advantages/disadvantages of each method?
As a practitioner, are there any guidelines to help me decide which method I should choose for a particular problem?

Some examples of competing methods in different areas of Machine Learning include:

Classification. SVM vs Gaussian Process Classification
Regression. Kernel Ridge Regression vs Gaussian Process Regression
Dimensionality reduction. PCA vs Probabilistic PCA
Topic modelling. Non-negative Matrix Factorisation vs Latent Dirichlet Allocation
Multiple Kernel Learning. SimpleMKL vs VBpMKL (Variational Bayes probabilistic MKL)
Multi-Task Learning. Regularised Multi-Task Learning vs Sparse Bayesian Multi-task Learning.
Compressed sensing. $\ell_1$ minimisation vs Bayesian Compressed Sensing using Laplace Priors

I think this question is too broad. It makes sense to ask the question regarding the difference between particular methods, but I'm not sure the Bayesian/Frequentist dimension is well-motivated as the best way to approach that sort of question. Is this too many questions in one? — Tamzin Blake, Feb 09 '12 at 19:13
I am undecisive whether this question is brilliant or complete nonsense, i.e. I am not sure whether there is a ML perspective on this discussion Can you name an example/algorithmn/field of application for each reasoning in ML ? — mlwida, Feb 08 '12 at 11:30
@steffen, some examples might be: SVM vs Gaussian Process Classification; Kernel Ridge Regression vs Gaussian Process Regression; Dimensionality reduction via PCA vs Probabilistic PCA; Topic modelling via Non-negative Matrix Factorisation vs Latent Dirichlet Allocation; Multiple Kernel Learning using SimpleMKL vs VBpMKL (Variational Bayes probabilistic MKL); Multi Task Learning via regulariation methods vs Sparse Bayesian Multi-task Learning; Compressed sensing via $\ell_1$ minimisation vs Bayesian Compressed Sensing using Laplace Priors — tdc, Feb 08 '12 at 12:59
I changed the title slightly, as Bayesian/Frequentist possibly aren't strictly correct in this context. There's also the question of where Information Theoretic methods (such as k-means clustering) fit into this -- I wouldn't say that they fit directly into either camp. — tdc, Feb 09 '12 at 09:18
@ThomBlake Maybe so, but I think the same arguments may well surface when looking across the different methods. I still feel like the 3 parts to the question above can be answered as they stand. — tdc, Feb 10 '12 at 11:44
I don't think the dichotomy "probabilistic" vs. "optimisation" is really fruitful. Methods from optimisation theory are used both in Bayesian (posit a distribution over parameters and integrate them out) and frequentist (estimate a fixed set of parameters) approaches. A general advice would be that if you have large amounts of data and your model "is reasonable", maintaining uncertainty over parameters might not be crucial. With little data, or if you for other reasons want to maintain the uncertainty, Bayesian methods would be preferable. — , Feb 14 '12 at 11:46

Probabilistic (Bayesian) vs Optimisation (Frequentist) methods in Machine Learning

0 Answers0