For many regression/classification algorithms, we have the bayesian version of it. Like bayesian linear regression, bayesian logistic regression, bayesian neuron network. I do not fully understand the math in them, but what are its advantages compared with the original algorithm? Is is of great practical use?
-
2Having admitted you do not yet understand the math, this might be helpful for you http://stats.stackexchange.com/questions/41794/bayesian-updating-for-a-discrete-rating-value/43048#43048 and for practicle use you might wish to look at this http://stats.stackexchange.com/questions/43471/examples-of-bayesian-and-frequentist-approach-giving-different-answers/43498#43498 – phaneron Nov 13 '12 at 18:28
4 Answers
Doing Bayesian regression is not an algorithm but a different approach to statistical inference. The major advantage is that, by this Bayesian processing, you recover the whole range of inferential solutions, rather than a point estimate and a confidence interval as in classical regression. (I can only recommend you to read a statistics manual to understand the difference between an algorithm and statistical inference.)

- 90,397
- 9
- 157
- 575
-
1Could you kindly elaborate on "the whole range of inferential solutions"? That would help the OP (and me!) to better understand how you see the contrast with classical methods. – Assad Ebrahim Jul 31 '13 at 16:04
-
@Xi'an The OP's question seems to concern estimation whereas your answer seems to concern inference. – AdamO Feb 18 '16 at 22:17
-
2@AdamO: given that the OP has not been seen on X validated since Nov. 12, 2012, (s)he does not seem very concerned by the question! – Xi'an Feb 19 '16 at 09:06
The Difference
Let's do a small thought experiment with regards to regression. Let's make this simple regression:
$y = \beta_0 + \beta_1 x$
We can apply solve for the best possible weights and linear algebra states that the best weights can be found via:
$\beta^* = [\beta_0, \beta_1]^T = (X'X)^{-1} X'y$
Now imagine that we run the regression with a small dataset and with a large dataset. I think it should be safe to argue that we are much more certain that our estimate $\beta^*$ is sensible when we have 1000 points of data opposed to if we only have 10 points of data. Because $\beta^*$ is a single datapoint and not a distribution, we cannot quantify our certainty with it.
This is where and why bayesians interpret $\beta$ differently. Bayesians look at $\beta$ and think that depending on the dataset, we can be more or less certain about it. If this feels very confusing, you may appreciate this blogpost where the difference is mentioned in more detail. [Disclaimer, this blogposts are written by myself]
The Benefit
Now we will assume that we've learned our $\beta^*$ and this is a distribution instead of a mere number. You'll notice that our prediction now becomes stochastic.
$\beta_0 + \beta_1 x_i \to \hat{y_i} $
Our prediction $y_i$ is now a distribution too. This means that we have confidence bounds on our prediction. If you care about the uncertainty of your prediction, this is a very very nice thing.

- 1,129
- 1
- 9
- 10
-
2I dont think this is a good answer regarding the Bayesian approach, with a classical linear regression and a frequentist approach you also get a confidence interval which can be the analogous to the credible interval in the Bayesian approach. The main advantage as also commented below is that in the Bayesian approach you can incorporate prior or expert information, which does not happen in the classical liner regression for instance. – Nicolás Esteban Cofré Ramírez Nov 30 '20 at 11:30
-
Another benefit of the Bayesian approach is that you can look at the plots for the posterior distributions **directly** instead of looking at the distribution's statistics such as mean, confidence interval and kurtosis. – RikH May 16 '21 at 18:15
In general, the advantage of Bayesian estimation is that you can incorporate the use of a prior, or assumed knowledge about the current state of "beliefs", and how the evidence might update those beliefs.

- 52,330
- 5
- 104
- 209
-
Can you elaborate a little bit please on this? For example, when it is better to use bayesian regression than common regression? – Outcast Aug 30 '18 at 08:44
-
Imagine you have a data set of coin tosses, with only heads. If you want to model the probability of heads using a frequentist approach you will get p=1 because you have only seen heads. But if you use a bayesian approach and start with a prior that also gives some positive probability to other numbers, a distribution for your initial guess of p, you will update your distribution with the observed data and end up with a mix of the prior evidence and the evidence from the new data. This won't be p=1 if you used a proper distribution as prior for p. – Nicolás Esteban Cofré Ramírez Nov 30 '20 at 11:37
Maximum Likelihood Estimation(MLE) of the parameters of a Non Bayesian Regression model or simply a linear regression model overfits the data, meaning the unknown value for a certain value of independent variable becomes too precise when calculated. Bayesian Linear Regression relaxes this fact, saying that there is uncertainty involved by incorporating "Predictive Distribution".

- 9
- 2
-
-1. First, overfitting is not the same as *the unknown value <...> becomes too precise when calculated*. Second, reporting some measure of uncertainty (or even an entire predictive distribution) in addition to a point estimate does not deal with overfitting. – Richard Hardy Oct 07 '21 at 20:13