27

In some sense this is a crosspost of mine from math.stackexchange, and I have the feeling that this site might provide a broad audience.

I am looking for a mathematical introduction to machine learning. Particularly, lots of literature that can be found is relatively imprecise and a lot of pages are spent without any content.

However, starting from such literature, I discovered the Coursera courses from Andrew Ng, the book of Bishop on pattern recognition and finally a book of Smola. Unfortunately, the book of Smola is only in draft state. In Smola's book even proofs can be found, which appeals to me. Bishop's book is already quite good, but a certain amount of rigor is missing.

In short: I am looking for a book like Smola's, that is, as precise and rigorous as possible and uses mathematical background (though short introductions are of course OK).

Any recommendations?

Quickbeam2k1
  • 223
  • 1
  • 4
  • 11
  • 2
    In the future please dont crosspost. – Momo Mar 25 '15 at 19:58
  • It looks like the question is unfinished - it breaks off after "and". – J W Mar 25 '15 at 20:30
  • sorry, somehow my edit vanished. – Quickbeam2k1 Mar 25 '15 at 20:44
  • 1
    you might want to explain why a mathematician wants to learn about machine learning (to find a job as data scientist/ to do research/ etc) which will help people point you in the right direction – seanv507 Mar 25 '15 at 20:46
  • @sean, I want to find a job as data scientist. Besides, it's an interesting topic. I just finished my PhD in mathematics, particularly in the field of pde. Further focal point of my studies were, numerics, differential geometry, geometric modelling and computational geometry. However, in germany, the market for pde guys is almost non existent. Especially if you want to leave the universities (at least in that field). Having spend some time on mathematics, I want to stay in contact with math. I think ML is ideal for that. – Quickbeam2k1 Mar 25 '15 at 21:23
  • @Quickbeam2k1 was Smola's book worth reading? – Mike Miller Mar 25 '15 at 21:38
  • 1
    for data science I would argue you need basic statistics understanding (eg linear/logistic regression),experimental design-eg ab testing etc,and in addition an understanding of recommender system techniques – seanv507 Mar 25 '15 at 22:58
  • @ Mike Miller, Smola's draft book only has few content. The book learnign with kernels, posted by Marc Claesen seems to be more fitting to me. @ Seanv507: What if you are a data scientist in the field of (geometric) pattern recognition and algorithm development? – Quickbeam2k1 Mar 26 '15 at 05:16
  • @Quickbeam2k1 - I don't recognise the field - can you give examples. I was just pointing out where the majority of the jobs are. – seanv507 Mar 26 '15 at 12:09
  • @seanv507: Assume you have CAD-data and you need to classify them according to their shape. (Easiest cases: donuts, spheres, balls, cylinders). New algorithms for e.g. faster search are required. When working in the area of trading systems and risk control, e.g. in banks, you need to analyse data. Additonally , you also might need to improve existing algorithms, due to different or new models or lack of speed. Or maybe the most prominent: working on autonomous cars. – Quickbeam2k1 Mar 26 '15 at 12:31
  • I second @seanv507's comment= yes, there are a few exciting jobs in machine learning. Sure, aim for them. However, I think the majority of "machine learning" jobs are actually more like "automated basic statistics" (as featured in Ng's course, btw). – P.Windridge Mar 26 '15 at 14:50
  • Has my question been undeleted? Somehow it was lost before today? – Quickbeam2k1 Apr 01 '15 at 08:30

4 Answers4

15

I would recommend Elements of Statistical Learning (free PDF file). It has sufficient maths and a good introduction to all the relevant techniques - together with some insights on why the techniques work (and when they don't).

Also Introduction to Statistical Learning (which is more practical - how to do it in R). It has a course running statistical learning; you might find the lectures on YouTube (and again free PDF).

Peter Mortensen
  • 271
  • 3
  • 8
seanv507
  • 4,305
  • 16
  • 25
  • 3
    That is a very nice recommendation. In addition to this, I suggest "Learning from Data" from Yaser S. Abu-Mostafa. It is heavily theoretical but explains very clearly topics such as feasibility of learning and VC dimension. The are videos and slides available [online](https://work.caltech.edu/telecourse.html). – tiagotvv Mar 26 '15 at 10:29
  • I second the suggestion "Learning from Data" from Yaser S. Abu-Mostafa. The book is very short but packed with valuable information. Much focus is indeed put on feasibility of learning and complexity. – Vladislavs Dovgalecs Mar 31 '15 at 23:23
11

For what you describe, I highly recommend "Foundations of Machine Learning" by Mohri et.al. It is an undergraduate text, but it is for really good undergraduates. It is readable and it is the only place I have found what I would call a mathematical definition of machine learning (pac and weak pac). It is worth reading for that reason alone. I also have a math Phd. I'm familiar with, and like, many of the books mentioned above. I'm particularly fond of ESL for a broad spectrum of techniques and ideas, but it's a statistics book with lots of mathematics.

meh
  • 1,902
  • 13
  • 18
  • 1
    Btw, I'm told that Schapire, in his thesis proved that weak PAC implies PAC. His proof amounts to the boosting technique, so it's a nice example of how a theoretical question led to a very practical result. – meh Mar 31 '15 at 23:14
  • Thanks, for your remarks. I think I will work with ESL later after working with Mohri's and Shalev-Shwartz's books – Quickbeam2k1 Apr 01 '15 at 09:59
9

You will probably like Learning With Kernels by Schölkopf and Smola. Most of Schölkopf's work is mathematically rigorous.

That said, you are probably better off reading research papers instead of textbooks. Research papers contain full derivations and proofs of convergence, bounds on performance, etc. which are very often not included in textbooks. A good place to start is the Journal of Machine Learning, which is highly regarded and fully open access. I also recommend the proceedings of conferences like ICML, NIPS, COLT and IJCNN.

Peter Mortensen
  • 271
  • 3
  • 8
Marc Claesen
  • 17,399
  • 1
  • 49
  • 70
  • thanks for the hints with the journal. However, I fear that the journals are, so far, too advanced for me. Nevertheless, this migth be a valuable source for the future. – Quickbeam2k1 Mar 25 '15 at 20:42
5

I would suggest Understanding Machine Learning: From Theory to Algorithms by Shai Shalev-Shwartz. I admit that I read only small portions of it but I immediately noticed rigor with which author approached every problem and discussion.

Vladislavs Dovgalecs
  • 2,315
  • 15
  • 18