61

I find resources like the Probability and Statistics Cookbook and The R Reference Card for Data Mining incredibly useful. They obviously serve well as references but also help me to organize my thoughts on a subject and get the lay of the land.

Q: Does anything like these resources exist for machine learning methods?

I'm imagining a reference card which for each ML method would include:

  • General properties
  • When the method works well
  • When the method does poorly
  • From which or to which other methods the method generalizes. Has it been mostly superseded?
  • Seminal papers on the method
  • Open problems associated with the method
  • Computational intensity

All these things can be found with some minimal digging through textbooks I'm sure. It would just be really convenient to have them on a few pages.

Haitao Du
  • 32,885
  • 17
  • 118
  • 213
lowndrul
  • 2,057
  • 1
  • 18
  • 20
  • 5
    A nice goal, but "minimal digging through some textbooks" ? How could one even start to compress say these [20 Books for statistical learning and data mining](http://www.amazon.com/Books-statistical-learning-data-mining/lm/2CYJJZZO4OF1/ref=cm_lmt_DYNA_f_1_russss0?pf_rd_p=496997231&pf_rd_s=listmania-center&pf_rd_t=201&pf_rd_i=9814324388&pf_rd_m=ATVPDKIKX0DER&pf_rd_r=079N42EA8Q0S7NBTPTGD) + [mloss.org/software/rating](http://mloss.org/software/rating) ? – denis Jun 27 '11 at 10:48
  • 2
    another one: http://stats.stackexchange.com/questions/19311/building-background-for-machine-learning-for-cs-student – mlwida Dec 20 '11 at 16:10
  • 2
    (+1) for the chuzpa, if such an overview would exist, I'd pay for it. The key problem is that beside some properties which could be derived from the algorithm itself, the majority of such properties or rules of the thumb is gained by experience, i.e. application. I am pretty sure a battle-hardened applied researcher or ML-framework-programmer/consultant could write something like that ... but here and now ? – mlwida Jun 06 '12 at 07:20
  • @Denis: the "20 books.." link does not work, can you check this? – lmsasu Jul 26 '12 at 11:27
  • @Imsasu, sorry, Amazon removed it, and I don't remember who had made the list. Try compressing just 2: Hastie et al. > 700 pages, and MacKay, > 600 ? – denis Jul 27 '12 at 14:39
  • I tried to edit this to make your question clearer, but I'm still not sure about the details. Do you want a book that will *explain* "recent concepts" (ie developments since Moore & Haykin) w/ "easy language & good exercises"? Do you just want to know if Bishop's book & Ng's totorials are up to date? Note that I'm not sure how answerable this question is in its current form. – gung - Reinstate Monica Sep 01 '12 at 20:03
  • 7
    I'm no machine learning expert so I will defer to others to post answers but I do think that [The Elements of Statistical Learning](http://www.stanford.edu/~hastie/local.ftp/Springer/OLD//ESLII_print4.pdf) is considered a good text on the subject and is written by some of the biggest names in the field. I should add that this book is written at a high level and those I've heard recommend it did have PhDs in statistics. – Macro Sep 01 '12 at 20:06
  • Re: Bishop, I did have the devil of a time getting the answers to the exercises out of Springer when I was thinking setting it for a class. In the end I never did (either). – conjugateprior Sep 01 '12 at 20:58
  • Macro's suggestion is a great choice. Also, take a look at MacKay's book, which is freely available here: http://www.inference.phy.cam.ac.uk/mackay/itila/p0.html ; His exposition is very intuitive. – Zen Sep 02 '12 at 00:26
  • Several suggestions were made on (many) related threads, e.g. [Machine learning self-learning book?](http://stats.stackexchange.com/q/20040/930). Material from Radford Neale's yearly course on [Statistical Methods for Machine Learning and Data Mining](http://www.utstat.utoronto.ca/~radford/sta414/) is available online. – chl Sep 02 '12 at 08:22

17 Answers17

31

If you want to learn Machine Learning I strongly advise you enroll in the free online ML course in the winter taught by Prof. Andrew Ng.

I did the previous one in the autumn and all learning material is of exceptional quality and geared toward practical applications, and a lot easier to grok that struggling alone with a book.

It's also made a pretty low hanging fruit with good intuitive explanations and the minimum amount of math.

clyfe
  • 790
  • 7
  • 8
25

Some of the best and freely available resources are:

As to the author's question I haven't met "All in one page" solution

lmsasu
  • 519
  • 6
  • 17
Sergey
  • 606
  • 6
  • 10
  • Sergey, is Barber's book tied to Matlab ? – denis Jun 28 '11 at 09:19
  • 2
    Yes, Just take a look at the book's link: _The BRMLtool box is provided to help readers see how mathematical models translate into actual MAT-LAB code._ – Sergey Jun 28 '11 at 09:26
14

Yes, you are fine; Christopher Bishop's "Pattern Recognition and Machine Learning" is an excellent book for general reference, you can't really go wrong with it.

A fairly recent book but also very well-written and equally broad is David Barber's "Bayesian Reasoning and Machine Learning"; a book I would feel is slightly more suitable for a new-comer in the field.

I have used "The Elements of Statistical Learning" from Hastie et al. (mentioned by Macro) and while a very strong book I would not recommended it as a first reference; maybe it would serve you better as a second reference for more specialized topics. In that aspect, David MacKay's book, Information Theory, Inference, and Learning Algorithms, can also do a splendid job.

usεr11852
  • 33,608
  • 2
  • 75
  • 117
  • 2
    +1 for Bishop. Clear development with an even level of detail. While still good I always found Hastie et al. a little bit choppy. – conjugateprior Sep 01 '12 at 20:54
  • 1
    +1 -- Hastie , Tibshirani and Friedman is my personal favorite. – StasK Sep 01 '12 at 23:16
  • 1
    +1 too for recommending Hastie, Tibshirani and Friedman, my personal favorite too. And thanks for the other recommendations; I'll give them a read because I really need a good book to recommend to non-statistician (or persons just entering the field). – Néstor Sep 02 '12 at 20:43
  • 1
    +1 for Bishop. It's actually a great source for classical stats too, but updated and in disguise. – conjectures May 29 '14 at 19:28
10

Since the consensus seems to be that this question is not a duplicate, I'd like to share my favorite for machine learner beginners:

I found Programming Collective Intelligence the easiest book for beginners, since the author Toby Segaran is is focused on allowing the median software developer to get his/her hands dirty with data hacking as fast as possible.

Typical chapter: The data problem is clearly described, followed by a rough explanation how the algorithm works and finally shows how to create some insights with just a few lines of code.

The usage of python allows one to understand everything rather fast (you do not need to know python, seriously, I did not know it before, too). DONT think that this book is only focused on creating recommender system. It also deals with text mining / spam filtering / optimization / clustering / validation etc. and hence gives you a neat overview over the basic tools of every data miner.

mlwida
  • 9,922
  • 2
  • 45
  • 74
6

Witten and Frank, "Data Mining", Elsevier 2005 is a good book for self-learning as there is a Java library of code (Weka) to go with the book and is very practically oriented. I suspect there is a more recent edition than the one I have.

Dikran Marsupial
  • 46,962
  • 5
  • 121
  • 178
  • 1
    Yes, this book was to be called "Machine Learning" but the name was changed to "Data Mining" by the publishers to ride on the data mining hype at the time, nevertheless the book is about ML not DM (the two bare similarities, but are different fields!). – clyfe Dec 20 '11 at 16:03
  • 1
    Tom Mitchell's book "Machine Learning" is also very good; the style is a bit old-fasioned, but the content is excellent. – Dikran Marsupial Dec 20 '11 at 16:15
  • Yes, Tom Mitchell's ML is like the ML bible, really comprehensive on the field! – clyfe Dec 20 '11 at 20:13
5

enter image description here

Often the hardest part of solving a machine learning problem can be finding the right estimator for the job. Different estimators are better suited for different types of data and different problems. The flowchart below is designed to give users a bit of a rough guide on how to approach problems with regard to which estimators to try on your data. Click on any estimator in the chart below to see it’s documentation.

Anton Tarasenko
  • 289
  • 4
  • 11
5

I have Machine Learning: An Algorithmic Perspective by Stephen Marsland and find it very useful for self-learning. Python code is given throughout the book.

I agree with what is said in this favourable review:

http://blog.rtwilson.com/review-machine-learning-an-algorithmic-perspective-by-stephen-marsland/

Glen
  • 6,320
  • 4
  • 37
  • 59
5

"Elements of Statistical Learning" would be a great book for your purposes. The and 5th printing (2011) of the 2nd edition (2009) of the book is freely available at http://www.stanford.edu/~hastie/local.ftp/Springer/ESLII_print5.pdf

Richard Hardy
  • 54,375
  • 10
  • 95
  • 219
DanB
  • 898
  • 8
  • 13
  • 2
    it is mathematics heavy book therefore it may be difficult for self learner to follow. – Atilla Ozgur Dec 24 '11 at 13:22
  • Do you know how it happens to be freely downloadable on Trevor Hastie's personnal pages when Springer charges 70$ for it? – M. Toya Jul 27 '12 at 19:11
  • I don't know for sure, but I would imagine Springer wants the money, and the authors mainly want to publicize their book widely. This seems very similar to how Springer will sell you published articles while many "working paper versions" are freely available on the author's website. – DanB Jul 29 '12 at 14:20
  • FYI, the download is for the 5th **printing** of the second edition. I love the footnote to the eipgraph "In God we trust, all others bring data" which is attributed to Deming. The footnote points out the irony that no "data" can be found confirming Deming actually saying this. – HeatfanJohn Sep 25 '12 at 16:29
  • 1
    You should mention **Introduction to Statistical Learning with R**--it's sort of like their **ESL**-lite (if the math in **ESL** is too daunting). – Steve S Nov 15 '14 at 11:19
4

The awesome-machine-learning repository seems to be a master list of resources, including code, tutorials and books.

orluke
  • 121
  • 2
3

Microsoft Azure also provides a similar cheat-sheet to the scikit-learn one posted by Anton Tarasenko.

Microsoft Azure Machine Learning Algorithm Cheat Sheet

(source: https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-algorithm-cheat-sheet)

They accompany it with a notice:

The suggestions offered in this algorithm cheat sheet are approximate rules-of-thumb. Some can be bent, and some can be flagrantly violated. This is intended to suggest a starting point. (...)

Microsoft additionally provides an introductory article providing further details.

Please notice that those materials are focused on the methods implemented in Microsoft Azure.

Tim
  • 108,699
  • 20
  • 212
  • 390
3

Most books mentioned in other answers are very good and you can't really go wrong with any of them. Additionally, I find the following cheat sheet for Python's scikit-learn quite useful.

Marc Claesen
  • 17,399
  • 1
  • 49
  • 70
2

I like Duda, Hart and Stork "Pattern Classification". This is a recent revision of a classic text that explains everything very well. Not sure that it is updated to have much coverage of neural networks and SVMs. The book by Hastie, Tibshirani and Friedman is about the best there is but may be a bit more technical than what you are looking for and is detailed rather than an overview of the subject.

Macro
  • 40,561
  • 8
  • 143
  • 148
Michael R. Chernick
  • 39,640
  • 28
  • 74
  • 143
1

For a first book on machine learning, which does a good job of explaining the principles, I would strongly recommend

Rogers and Girolami, A First Course in Machine Learning, (Chapman & Hall/CRC Machine Learning & Pattern Recognition), 2011.

Chris Bishop's book, or David Barber's both make good choices for a book with greater breadth, once you have a good grasp of the principles.

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
Dikran Marsupial
  • 46,962
  • 5
  • 121
  • 178
1

Don't start with Elements of Statistical Learning. It is great, but it is a reference book, which doesn't sound like what you are looking for. I would start with Programming Collective Intelligence as it's an easy read.

Neil McGuigan
  • 9,292
  • 13
  • 54
  • 62
  • I'm not sure I would characterize ESL as a reference text. It seems more of an overview to me, i.e., you aren't going to learn the nitty gritty details of (hardly) anything. You will see the broad techniques and overarching themes. – cardinal Sep 03 '12 at 14:11
0

I wrote a summary like that, but only on one machine learning task (Netflix Prize), and it has 195 pages: http://arek-paterek.com/book

Arek Paterek
  • 127
  • 3
0

Check this link featuring some free ebooks on machine learning : http://designimag.com/best-free-machine-learning-ebooks/. it might be useful for you.

0

A good cheatsheet is the one in Max Kuhn book Applied Predictive Modeling. In the book there is a good summary table of several ML learning models. The table is in appendix A page 549:

Table A.1: A summary of models and some of their characteristics

Anton Tarasenko
  • 289
  • 4
  • 11
PolBM
  • 1,363
  • 11
  • 19