7

I am a CS graduate student and I am starting to get really interested in Machine Learning (and Predictive Analytics). I have started working on a text classification project with a professor to learn the field but I am realizing that my background is pretty weak. The professor is too busy to teach me the basics, so I have to do it on my own.

I have done some calculus, stat 101 and linear algebra in the distant past, but I do not remember too much. But I am pretty sure if I get a good book, I can pick things up pretty quickly.

As of now I have started off my Machine Learning studies by reading Alpaydin's Machine Learning but it is pretty challenging given the big holes in my background.

I am looking for suggestions on books on major topics (Linear Algebra, Statistics, Probability, Optimization, and anything else that might be relevant) that would help me ramp up relatively quickly, given my background.

I am looking at doing it in multiple iterations. First, get what I need to do be able to do get some practical work done (like my current classification project), and then go back and read books to get deeper understanding, and so on.

Please note that at this point I am only looking at formal treatment of the subjects (textbooks only).

EDIT 1: I think I may have found stats and probability book (All of Statistics: A Concise Course in Statistical Inference by Larry Wasserman). Just need to figure out how to put the remaining pieces together. I welcome any further ideas in this regard.

Has QUIT--Anony-Mousse
  • 39,639
  • 7
  • 61
  • 96
user721975
  • 825
  • 2
  • 9
  • 15
  • [This question](http://stats.stackexchange.com/questions/6538/mathematician-wants-the-equivalent-knowledge-to-a-quality-stats-degree) is more centered on acquiring statistics knowledge. However, based on your stated goals and topics of interest, some of the suggestions in the various answers may be helpful. – cardinal Dec 04 '11 at 03:41
  • You might also be interested in the following question, if you haven't already seen it: http://stats.stackexchange.com/questions/18973/can-you-recommend-a-book-to-read-before-elements-of-statistical-learning – cardinal Dec 04 '11 at 03:42
  • @cardinal: Yes I have seen the ones you mention. But my question is more like, "what is the absolute minimal starter set of books to get going with the machine learning for someone with only a little bit of exposure in the required fields?" – user721975 Dec 04 '11 at 03:50
  • 3
    Also, for an exceedingly gentle "ramp" see the current video lectures on machine learning by Andrew Ng at Stanford. They provide a fairly sound introduction to Hastie or Bishop. http://www.ml-class.org/course/class/index –  Dec 04 '11 at 15:38
  • @user721975 the second thread of cardinal is the way to go. The suggested book "Programming Collective Intelligence" requires no prior knowledge except basic math skills (it has been written for CS-guys). One level below PCI would be introductory books to programming and high school mathematics. Do you have a "from-absolutely-zero-to-hero"-book guide in mind ? – mlwida Dec 05 '11 at 10:27
  • 2
    understand the difference between *machine learning* and *data mining*. AI and ML are fields essentially focused on the result quality of their learning algorithms, and are often evaluated by their ability to *reproduce existing knowledge*. True data mining (unless you are playing buzzword bingo) goes way beyond learning and prediction, but includes in particular: *data management* aspects such as indexing and using the indexes to accelerate your methods, *knowledge discovery* which focuses on finding previously *unknown* patterns in the data. Of course, many AI and ML methods are used in data – Has QUIT--Anony-Mousse Dec 06 '11 at 09:39

2 Answers2

5

Have you seen the Stanford online class on machine learning? It might be a great way to learn machine learning in general.

References on text mining in particular are a different question; I don't have any particular suggestions on that.

D.W.
  • 5,892
  • 2
  • 39
  • 60
  • I have looked at earlier offerings of the class. It seems like the professor is good at explaining how a technique works but does not offer too much intuition on things. Things like what type of data this technique is likely to work on or fail (and why) etc. – user721975 Dec 05 '11 at 01:49
  • @user721975 See additionally [this question](http://stats.stackexchange.com/questions/12386/machine-learning-cookbook-reference-card-cheatsheet) which has no definitive answer (which speaks for itself). – mlwida Dec 05 '11 at 10:31
  • During the course Andrew Ng frequently spends time talking specifically about the 'intuition' behind the ideas he describes. It also has numerous practical (Matlab/Octave) exercises to allow you to try different things out. +1 for recommending this course. – CatsLoveJazz Jan 28 '16 at 15:07
2

For a nice intro into stats, check out O'Reilly Think Stats by Allen B. Downey. It's a freely-available ebook from the author.

Xorlev
  • 121
  • 2