I have completed the machine learning course and deep learning specialization by Andrew Ng on Coursera, and now learning TensorFlow 2 for Deep Learning Specialization by the imperial college of London, which is a course that teaches you how to use TensorFlow. However, when I head to the third course, it talks about Probabilistic Deep Learning. I do not have any idea of what bayesian is. For prerequisite, it recommends this course: https://www.coursera.org/learn/bayesian-methods-in-machine-learning. I have watched the first few videos, but the way it teaches is not for me. I am wondering, are there any other sources of learning this topic. I know basic probability theory, both univariate and multivariate distributions, variance, MGFs, expectations, etc.
-
We have a number of questions about Bayesian statistics available via search. Here's a promising one https://stats.stackexchange.com/questions/7351/bayesian-statistics-tutorial and even more https://stats.stackexchange.com/search?q=%5Breferences%5D+bayesian+answers%3A1+score%3A3 – Sycorax Apr 09 '21 at 02:24
2 Answers
It depends on how advanced you are. I would recommend William Bolstad's "Introduction to Bayesian Statistics" and his "Understanding Computational Bayesian Statistics."
In Pearson and Neyman's method of statistics with its minimum variance unbiased estimator, uniformly most powerful tests, and so forth, all information comes from the data and the data alone. For example, imagine you calculate a confidence interval for calories and get the interval [-10,90]. From domain knowledge, you know that negative calories cannot exist.
That is domain knowledge. It is not in the data. Bayesian methods require you to put outside information into the methodology. Pearson and Neyman's methods don't allow outside information.
Bayesian methodologies differ from Frequentist methods in that data is not considered random. Parameters are random. Randomness does not imply chance, it implies uncertainty.
There is a good discussion of Frequentist confidence intervals versus Bayesian credible intervals at here.
Bayesian methods start with a distribution of the knowledge you have about the problem seen outside the collection of the data. That distribution is multiplied by the likelihood function and normalized. That gives you a new distribution.
The distributions quantify your uncertainty about parameters.
If you need a theoretical grounding, pick up E.T. Jaynes "Probability Theory: The Language of Science." Also, if you need a grounding in decisions, grab Christian Robert's "The Bayesian Choice." Both are rigorous. Neither are for beginners.
There are three primary axiomatizations of Bayesian probability theory. Jaynes provides one of them. The other two are by Leonard Jimmie Savage and Bruno de Finetti. The axioms do not match Kolmogorov's. Because of that, Bayesian and Frequency based methods do not produce the same results. Jaynes' construction follows from Richard Cox's axioms.
Cox's axioms are built on the theory of logic. Savage's are built on preference theory and utility theory. De Finetti's axioms are built on gambling. They contrast with Kolmogorov's method which is derived from the measurement of sets and the basis of the probability theory you know.
Cox's perspective would be "what is the probability a statement from logic is true?"
De Finetti's perspective might be "how much money would I gamble on some assertion and at what odds?" Further, "how would those odds change as I garnered new information?"
Savage's perspective is a bit challenging because it is grounded in utility theory. It begins with $x\succ{y}$ as its first axiom. He might ask "what is your personal estimate of $\theta?$
Give yourself some time to get adjusted to the way of thinking. If you follow the link above, you will see that the calculations are orthogonal to the statistics you are used to. It produces different intervals for the same phenomenon in that link.
Any system that begins with logic, gambling, or your satisfaction won't look like an impersonal math built on sets and optimal estimators and tests.

- 6,957
- 13
- 21
-
My main purpose is to learn machine learning, and I don't know how much statistics do I need for learning probabilistic machine learning. I had a background in mathematics and I know that studying statistics rigorously really needs some effort. Do I really need all the stuff you have listed above for machine learning? If I don't, is there any source that talks about a direct application of Bayesian statistics in machine learning? – Charlie Apr 11 '21 at 01:53
-
@Victor how will you form your prior and for machine learning, how will you determine your utility function? In Bayesian statistics, unlike Frequentist statistics and machine learning, you cannot just plug one in. Bayesian methods don't have a decision tree like Frequentist methods. Each problem is unique. You also need to be sure that your posterior integrates to unity, so many of the easy tools collapse once you get to three or more dimensions. Bayesian methods bring a different set of problems to the table. I can't know what you already know. – Dave Harris Apr 11 '21 at 23:31
Machine Learning: A Bayesian and Optimization Perspective (2nd Edition) by Sergios Theodoridis seems to squarely address your interests:
it builds from topics you know about probability theory to topics in machine learning
it covers a wide range of machine learning topics
it motivates machine learning topics from the perspective of Bayesian reasoning

- 76,417
- 20
- 189
- 313
-
I already had some good understanding of machine learning/deep learning, what I lack is the knowledge of Bayesian statistics and how does it relate to machine learning. So far what I encountered in machine learning seemed unrelated to bayesian statistics. I've read the table of contents and this book seems to be teaching machine learning from the basics, which I already have a good idea what is it. Is there any other recommendations – Charlie Apr 11 '21 at 02:01