Mathematical Machine Learning Theory "from scratch" textbook?

Question

I am looking for textbook(s) (a single one, preferably) covering the material from Elements of Statistical Learning with good scaffolding (ESL jumps around too much for me and serves as more of a reference) and with detailed derivations and proofs of algorithms, as well as outlining with detailed, step-by-step examples how these algorithms are executed from scratch, particularly for the purpose of studying for exams for which I will not have a computer. Texts with solutions available are strongly preferred.

Edit: Some of you may be baffled as to why I would need such a text. As an example, you can view a past practice exam for my upcoming course at this link.

I have a month to learn this material, and I don't have time to dig through the 10+ machine learning books I have to try to dissect what they are saying. I have a strong preference for textbooks written by mathematicians. Too many of the machine learning books I've seen talk about a concept for a few pages and skim over all of the computational details, or they just execute everything using XYZ package and assume that everything that comes out of the package has the appropriate output. I am very skeptical of this approach, and historically, my skepticism seems to have saved me from errors.

My background is equivalent to about half of a M.S. stats program in the U.S.: probability with calculus, Casella and Berger-level stats, (general + generalized) linear models with matrices, and experimental design. I am not afraid of by-hand matrix computations, and will probably need to know how to perform these algorithms by hand.

The very closest I've seen to what I'm looking for are Andrew Ng's CS 229 notes (see here), and I'll probably be using these - but they aren't as useful as I'd like, given that I don't have solutions to the homework assignments.

I have already read the following textbooks and haven't found them sufficient for my purposes:

Mohri et al, Foundations of Machine Learning (close to what I want, but no solutions available)
Clarke et al, Principles and Theory for Data Mining and Machine Learning (no solutions available, seems to suppose a measure-theoretic background)
Murphy, Machine Learning (extremely dense)
James et al, Introduction to Statistical Learning (relies too little on theory, and too much on assuming that the R code will work - I've already spotted errors - for example here)
Izenman, Modern Multivariate Statistical Techniques (better than ESL, but skims over details, uses slightly nonstandard notation, and see the link to one of the questions I have above).

Are there any other books that I don't know of that would be useful for my purposes?

What do you mean by "step-by-step examples of machine learning without computer"..? Those algorithms are designed to run on computers, "running" them by pen and pencil is a waste of paper and time at best (I wonder how many pages you'd need to waste to "run" knn...). — Tim, Dec 07 '17 at 18:03
mathematicians do not write ML textbooks, they don't do this thing at all, luckily. Maybe only lately. — Aksakal, Dec 07 '17 at 18:23
@Tim In my graduate program, it is often the case (and judging by past exams for this upcoming class, it -is- the case) that I have to know how to execute these by hand. Obviously, this would be done with a small data set. See, for example, http://www.public.iastate.edu/~vardeman/stat502x/Stat%20502X%20Exam%201-2016.pdf — Clarinetist, Dec 07 '17 at 18:33
If you're cramming for an exam, honestly you're better off getting your hands on all the past exams, past exam solutions, past homework, and past homework solutions you can find from that class and/or that prof. Supplement that with the lecture notes. If there's something deeply confusing, then quickly check in other books. Your profs new exams are going to be far more correlated with his/her personal material than some random machine learning book, no matter how good the book is. — Matthew Gunn, Dec 07 '17 at 21:55
@MatthewGunn Actually, I'm trying to learn the material ahead of time. — Clarinetist, Dec 07 '17 at 21:56
Can you get old lecture notes? I'd still go through those then start looking up math you don't know and checking how other books/notes treat the same topics. — Matthew Gunn, Dec 07 '17 at 21:59
Hi, my backgroud is Pure Math and I found the book Machine Learning A Bayesian and Optimization Perspective by Sergios Theodoris, Academic Press extremely useful. I got a throughout understanding of Machine Learning thanks to this book (and a job in "analytics" as well :-) ). The background needed to read this is book is Statistics at the level of Casella & Berger and some numerical analysis for some chapters ( the book Numerical Analysis by Richard Burden is more than enough). I really enjoyed the style of Sergios: clear, detailed and de-mythifying. Check it out. — Coffee, Dec 08 '17 at 02:19
@Coffee Please put that as an answer to this question. It's worth making more visible. — Clarinetist, Dec 08 '17 at 03:40
@Clarinetist I do not know if my comment qualifies as an answer but I am sure that Sergios' book will be very valuable to help you to be ready to solve an exam such as the one you linked. I had a lot of problems and frustration when I first wanted to learn the fundamentals of Machine Learning as most of the textbooks does not fit the logic and style of Pure Mathematics. — Coffee, Dec 08 '17 at 06:06
Even though Sergios' book is not a math textbook, it is written in a mathematical/statistical logical way, dissecting the ML techniques in their fundamental components from the point of view of Mathematics and Statistics without omitting analysis of algorithms. I also used Izenman's book as a supplement, great book too. — Coffee, Dec 08 '17 at 06:06
Sergios's book on Machine Learning does not cover clustering and segmentation, but you can find a comprehensive treatment of those techniques and general unsupervised learning in Sergios's book "Pattern Recognition" with the same great style. As I said these books are not Mathematical Machine Learning but they are written in a mathematical/statistical logical way. — Coffee, Dec 08 '17 at 06:11
The mathematical community seems to be interested and contributing in topics related to Machine Learning. In the book "Information Geometry" by Jurgen Jost et al they explain a lot of applications of Information Geometry to Machine Learning. I am enjoying a lot this book a lot as it is close to my original research area Differential Geometry and PDEs. — Coffee, Dec 08 '17 at 06:18
@Coffee For what it's worth, I've ordered both of the Theodoris books you recommended. If I'm sufficiently impressed, I might post those two books as an answer myself :) — Clarinetist, Dec 08 '17 at 12:52

Clarinetist · Accepted Answer · 2017-12-13T14:02:23.170

Per @Coffee's recommendation, I would recommend the text Machine Learning: A Bayesian and Optimization Perspective by Sergios Theodoridis along with Pattern Recognition by the same author.

These two texts combined are 2,000 pages total and cover everything from undergrad-level probability to linear models, and (as far as I can tell) everything covered by Elements of Statistical Learning, in addition to time series, probabilistic graphical models, deep learning, and Monte Carlo methods.

The author makes an excellent effort to make all notation clear and consistent (thank you for bolding all of your vectors!) and seems to have used carefully chosen exercises.

Having a background in probability as well as stats at the level of Casella and Berger would be extremely helpful to have before pursing these texts. There is some discussion of UMVUEs in here.

Good to know my comment was helpful. Good luck in your course! — Coffee, Dec 14 '17 at 19:07
Apologies for a late question; what difference is there between those books? Do they cover different stuff so both should be read? — MilTom, Apr 18 '20 at 22:43

Mathematical Machine Learning Theory "from scratch" textbook?

1 Answers1