Machine learning cookbook / reference card / cheatsheet?

Question

I find resources like the Probability and Statistics Cookbook and The R Reference Card for Data Mining incredibly useful. They obviously serve well as references but also help me to organize my thoughts on a subject and get the lay of the land.

Q: Does anything like these resources exist for machine learning methods?

I'm imagining a reference card which for each ML method would include:

General properties
When the method works well
When the method does poorly
From which or to which other methods the method generalizes. Has it been mostly superseded?
Seminal papers on the method
Open problems associated with the method
Computational intensity

All these things can be found with some minimal digging through textbooks I'm sure. It would just be really convenient to have them on a few pages.

A nice goal, but "minimal digging through some textbooks" ? How could one even start to compress say these [20 Books for statistical learning and data mining](http://www.amazon.com/Books-statistical-learning-data-mining/lm/2CYJJZZO4OF1/ref=cm_lmt_DYNA_f_1_russss0?pf_rd_p=496997231&pf_rd_s=listmania-center&pf_rd_t=201&pf_rd_i=9814324388&pf_rd_m=ATVPDKIKX0DER&pf_rd_r=079N42EA8Q0S7NBTPTGD) + [mloss.org/software/rating](http://mloss.org/software/rating) ? — denis, Jun 27 '11 at 10:48
another one: http://stats.stackexchange.com/questions/19311/building-background-for-machine-learning-for-cs-student — mlwida, Dec 20 '11 at 16:10
(+1) for the chuzpa, if such an overview would exist, I'd pay for it. The key problem is that beside some properties which could be derived from the algorithm itself, the majority of such properties or rules of the thumb is gained by experience, i.e. application. I am pretty sure a battle-hardened applied researcher or ML-framework-programmer/consultant could write something like that ... but here and now ? — mlwida, Jun 06 '12 at 07:20
@Denis: the "20 books.." link does not work, can you check this? — lmsasu, Jul 26 '12 at 11:27
@Imsasu, sorry, Amazon removed it, and I don't remember who had made the list. Try compressing just 2: Hastie et al. > 700 pages, and MacKay, > 600 ? — denis, Jul 27 '12 at 14:39
I tried to edit this to make your question clearer, but I'm still not sure about the details. Do you want a book that will *explain* "recent concepts" (ie developments since Moore & Haykin) w/ "easy language & good exercises"? Do you just want to know if Bishop's book & Ng's totorials are up to date? Note that I'm not sure how answerable this question is in its current form. — gung - Reinstate Monica, Sep 01 '12 at 20:03
I'm no machine learning expert so I will defer to others to post answers but I do think that [The Elements of Statistical Learning](http://www.stanford.edu/~hastie/local.ftp/Springer/OLD//ESLII_print4.pdf) is considered a good text on the subject and is written by some of the biggest names in the field. I should add that this book is written at a high level and those I've heard recommend it did have PhDs in statistics. — Macro, Sep 01 '12 at 20:06
Re: Bishop, I did have the devil of a time getting the answers to the exercises out of Springer when I was thinking setting it for a class. In the end I never did (either). — conjugateprior, Sep 01 '12 at 20:58
Macro's suggestion is a great choice. Also, take a look at MacKay's book, which is freely available here: http://www.inference.phy.cam.ac.uk/mackay/itila/p0.html ; His exposition is very intuitive. — Zen, Sep 02 '12 at 00:26
Several suggestions were made on (many) related threads, e.g. [Machine learning self-learning book?](http://stats.stackexchange.com/q/20040/930). Material from Radford Neale's yearly course on [Statistical Methods for Machine Learning and Data Mining](http://www.utstat.utoronto.ca/~radford/sta414/) is available online. — chl, Sep 02 '12 at 08:22

score 31 · Answer 1 · answered Dec 20 '11 at 14:58

31

If you want to learn Machine Learning I strongly advise you enroll in the free online ML course in the winter taught by Prof. Andrew Ng.

I did the previous one in the autumn and all learning material is of exceptional quality and geared toward practical applications, and a lot easier to grok that struggling alone with a book.

It's also made a pretty low hanging fruit with good intuitive explanations and the minimum amount of math.

answered Dec 20 '11 at 14:58

clyfe

790
7
8

I just finished this course and it is awesome! Also, it gave me a great start to understanding books on machine learning. – B Seven Dec 20 '11 at 17:07
2

I think this link now is https://www.coursera.org/course/ml – n611x007 May 02 '13 at 07:15

score 25 · Accepted Answer · edited Jul 26 '12 at 11:28

25

Some of the best and freely available resources are:

Hastie, Friedman et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction
David Barber. Bayesian Reasoning and Machine Learning
David MacKay. Information Theory, Inference and Learning Algorithms (http://www.inference.phy.cam.ac.uk/mackay/itila/)

As to the author's question I haven't met "All in one page" solution

edited Jul 26 '12 at 11:28

lmsasu

519
6
17

answered Jun 28 '11 at 06:23

Sergey

606
6
10

Sergey, is Barber's book tied to Matlab ? – denis Jun 28 '11 at 09:19
2

Yes, Just take a look at the book's link: _The BRMLtool box is provided to help readers see how mathematical models translate into actual MAT-LAB code._ – Sergey Jun 28 '11 at 09:26

score 14 · Answer 3 · answered Sep 01 '12 at 20:33

14

Yes, you are fine; Christopher Bishop's "Pattern Recognition and Machine Learning" is an excellent book for general reference, you can't really go wrong with it.

A fairly recent book but also very well-written and equally broad is David Barber's "Bayesian Reasoning and Machine Learning"; a book I would feel is slightly more suitable for a new-comer in the field.

I have used "The Elements of Statistical Learning" from Hastie et al. (mentioned by Macro) and while a very strong book I would not recommended it as a first reference; maybe it would serve you better as a second reference for more specialized topics. In that aspect, David MacKay's book, Information Theory, Inference, and Learning Algorithms, can also do a splendid job.

answered Sep 01 '12 at 20:33

usεr11852

33,608
2
75
117

2

+1 for Bishop. Clear development with an even level of detail. While still good I always found Hastie et al. a little bit choppy. – conjugateprior Sep 01 '12 at 20:54
1

+1 -- Hastie , Tibshirani and Friedman is my personal favorite. – StasK Sep 01 '12 at 23:16
1

+1 too for recommending Hastie, Tibshirani and Friedman, my personal favorite too. And thanks for the other recommendations; I'll give them a read because I really need a good book to recommend to non-statistician (or persons just entering the field). – Néstor Sep 02 '12 at 20:43
1

+1 for Bishop. It's actually a great source for classical stats too, but updated and in disguise. – conjectures May 29 '14 at 19:28

score 10 · Answer 4 · answered Dec 21 '11 at 07:44

Since the consensus seems to be that this question is not a duplicate, I'd like to share my favorite for machine learner beginners:

I found Programming Collective Intelligence the easiest book for beginners, since the author Toby Segaran is is focused on allowing the median software developer to get his/her hands dirty with data hacking as fast as possible.

Typical chapter: The data problem is clearly described, followed by a rough explanation how the algorithm works and finally shows how to create some insights with just a few lines of code.

The usage of python allows one to understand everything rather fast (you do not need to know python, seriously, I did not know it before, too). DONT think that this book is only focused on creating recommender system. It also deals with text mining / spam filtering / optimization / clustering / validation etc. and hence gives you a neat overview over the basic tools of every data miner.

score 6 · Answer 5 · answered Dec 20 '11 at 15:57

6

Witten and Frank, "Data Mining", Elsevier 2005 is a good book for self-learning as there is a Java library of code (Weka) to go with the book and is very practically oriented. I suspect there is a more recent edition than the one I have.

answered Dec 20 '11 at 15:57

Dikran Marsupial

46,962
5
121
178

1

Yes, this book was to be called "Machine Learning" but the name was changed to "Data Mining" by the publishers to ride on the data mining hype at the time, nevertheless the book is about ML not DM (the two bare similarities, but are different fields!). – clyfe Dec 20 '11 at 16:03
1

Tom Mitchell's book "Machine Learning" is also very good; the style is a bit old-fasioned, but the content is excellent. – Dikran Marsupial Dec 20 '11 at 16:15
Yes, Tom Mitchell's ML is like the ML bible, really comprehensive on the field! – clyfe Dec 20 '11 at 20:13

score 5 · Answer 6 · answered Sep 09 '15 at 14:07

http://scikit-learn.org/stable/tutorial/machine_learning_map/

Often the hardest part of solving a machine learning problem can be finding the right estimator for the job. Different estimators are better suited for different types of data and different problems. The flowchart below is designed to give users a bit of a rough guide on how to approach problems with regard to which estimators to try on your data. Click on any estimator in the chart below to see it’s documentation.

score 5 · Answer 7 · answered Dec 21 '11 at 01:02

I have Machine Learning: An Algorithmic Perspective by Stephen Marsland and find it very useful for self-learning. Python code is given throughout the book.

I agree with what is said in this favourable review:

http://blog.rtwilson.com/review-machine-learning-an-algorithmic-perspective-by-stephen-marsland/

score 5 · Answer 8 · edited May 15 '21 at 08:58

5

"Elements of Statistical Learning" would be a great book for your purposes. The and 5th printing (2011) of the 2nd edition (2009) of the book is freely available at http://www.stanford.edu/~hastie/local.ftp/Springer/ESLII_print5.pdf

edited May 15 '21 at 08:58

Richard Hardy

54,375
10
95
219

answered Dec 21 '11 at 01:55

DanB

898
8
13

2

it is mathematics heavy book therefore it may be difficult for self learner to follow. – Atilla Ozgur Dec 24 '11 at 13:22
Do you know how it happens to be freely downloadable on Trevor Hastie's personnal pages when Springer charges 70$ for it? – M. Toya Jul 27 '12 at 19:11
I don't know for sure, but I would imagine Springer wants the money, and the authors mainly want to publicize their book widely. This seems very similar to how Springer will sell you published articles while many "working paper versions" are freely available on the author's website. – DanB Jul 29 '12 at 14:20
FYI, the download is for the 5th **printing** of the second edition. I love the footnote to the eipgraph "In God we trust, all others bring data" which is attributed to Deming. The footnote points out the irony that no "data" can be found confirming Deming actually saying this. – HeatfanJohn Sep 25 '12 at 16:29
1

You should mention **Introduction to Statistical Learning with R**--it's sort of like their **ESL**-lite (if the math in **ESL** is too daunting). – Steve S Nov 15 '14 at 11:19

score 4 · Answer 9 · answered Apr 05 '16 at 22:15

4

The awesome-machine-learning repository seems to be a master list of resources, including code, tutorials and books.

answered Apr 05 '16 at 22:15

orluke

121
2

score 3 · Answer 10 · answered May 10 '17 at 10:49

Microsoft Azure also provides a similar cheat-sheet to the scikit-learn one posted by Anton Tarasenko.

(source: https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-algorithm-cheat-sheet)

They accompany it with a notice:

The suggestions offered in this algorithm cheat sheet are approximate rules-of-thumb. Some can be bent, and some can be flagrantly violated. This is intended to suggest a starting point. (...)

Microsoft additionally provides an introductory article providing further details.

Please notice that those materials are focused on the methods implemented in Microsoft Azure.

score 3 · Answer 11 · answered Aug 09 '13 at 10:38

3

Most books mentioned in other answers are very good and you can't really go wrong with any of them. Additionally, I find the following cheat sheet for Python's scikit-learn quite useful.

answered Aug 09 '13 at 10:38

Marc Claesen

17,399
1
49
70

score 2 · Answer 12 · edited Jun 04 '12 at 12:14

I like Duda, Hart and Stork "Pattern Classification". This is a recent revision of a classic text that explains everything very well. Not sure that it is updated to have much coverage of neural networks and SVMs. The book by Hastie, Tibshirani and Friedman is about the best there is but may be a bit more technical than what you are looking for and is detailed rather than an overview of the subject.

score 1 · Answer 13 · edited Jul 19 '13 at 19:08

1

For a first book on machine learning, which does a good job of explaining the principles, I would strongly recommend

Rogers and Girolami, A First Course in Machine Learning, (Chapman & Hall/CRC Machine Learning & Pattern Recognition), 2011.

Chris Bishop's book, or David Barber's both make good choices for a book with greater breadth, once you have a good grasp of the principles.

edited Jul 19 '13 at 19:08

Nick Cox

48,377
8
110
156

answered Sep 02 '12 at 09:18

Dikran Marsupial

46,962
5
121
178

score 1 · Answer 14 · answered Sep 02 '12 at 20:35

1

Don't start with Elements of Statistical Learning. It is great, but it is a reference book, which doesn't sound like what you are looking for. I would start with Programming Collective Intelligence as it's an easy read.

answered Sep 02 '12 at 20:35

Neil McGuigan

9,292
13
54
62

I'm not sure I would characterize ESL as a reference text. It seems more of an overview to me, i.e., you aren't going to learn the nitty gritty details of (hardly) anything. You will see the broad techniques and overarching themes. – cardinal Sep 03 '12 at 14:11

score 0 · Answer 15 · answered May 29 '14 at 15:59

0

I wrote a summary like that, but only on one machine learning task (Netflix Prize), and it has 195 pages: http://arek-paterek.com/book

answered May 29 '14 at 15:59

Arek Paterek

127
3

score 0 · Answer 16 · answered May 23 '15 at 06:54

0

Check this link featuring some free ebooks on machine learning : http://designimag.com/best-free-machine-learning-ebooks/. it might be useful for you.

answered May 23 '15 at 06:54

Yogesh Mankani

1

score 0 · Answer 17 · edited Jan 31 '21 at 03:36

0

A good cheatsheet is the one in Max Kuhn book Applied Predictive Modeling. In the book there is a good summary table of several ML learning models. The table is in appendix A page 549:

edited Jan 31 '21 at 03:36

Anton Tarasenko

289
4
11

answered May 10 '17 at 15:46

PolBM

1,363
11
19

Machine learning cookbook / reference card / cheatsheet?

17 Answers17

Linked

Related