87

If you could go back in time and tell yourself to read a specific book at the beginning of your career as a statistician, which book would it be?

Neil McGuigan
  • 9,292
  • 13
  • 54
  • 62
  • 1
    There are really three separate questions here! 1) What is the single most influential book in statistics; 2) What book should every statistician read; 3) What book have you read that you most wish you'd read much earlier. (2) and (3) probably have considerable overlap; (1) may be quite distinct. – onestop Feb 20 '11 at 08:50
  • 1
    [This question](http://stats.stackexchange.com/questions/23841/what-factors-make-for-a-great-stats-book/23844) is another way of looking at this question. I hope that it will provide a good complement, once it gets some good answers. – naught101 Feb 29 '12 at 13:50

26 Answers26

40

Here are two to put on the list:

Tufte. The visual display of quantitative information
Tukey. Exploratory data analysis

Rob Hyndman
  • 51,928
  • 23
  • 126
  • 178
  • 10
    Both are worth a periodic re-reading, maybe once a decade, just to refresh the ideas. Concerning Tukey: it's great to sit down just with pencil and paper once in a while and do a deep analysis of an interesting dataset. – whuber Sep 09 '10 at 22:55
  • 6
    For graphics for a statistician, I prefer William Cleveland's books to Tufte's. – Peter Flom Oct 02 '12 at 22:46
  • 1
    I have a feeling these books were meant to analyze non-linear data when non-linear methods weren't as available? – Robert Kubrick Jan 25 '17 at 13:57
36

The Elements of Statistical Learning from Hastie, Tibshirani and Friedman http://www-stat.stanford.edu/~tibs/ElemStatLearn/ should be in any statistician's library !

beroe
  • 133
  • 7
robin girard
  • 6,335
  • 6
  • 46
  • 60
  • 6
    I disagree - that one is closely related to machine learning, not statistics *per se*! – aL3xa Sep 20 '11 at 18:06
  • 1
    @aL3xa: it is certainly focussed on machine learning...which is why I think statisticians should be exposed to it early on. – Cliff AB Feb 08 '19 at 03:58
  • 3
    Apparently I'm in the minority in thinking this book is overrated. It seems to be written for a graduate-level student, but one who doesn't care about the details of how anything works. – The Laconic Mar 23 '19 at 16:14
  • @TheLaconic you are not alone, i think the same. E.g. Bishop's PRML, while assuming much fewer prerequisites than ESL, goes into the theory of ML much deeper, giving more derivations. ESL's broad following is ONLY because of the stellar authors (Tibshirani, Hastie, Friedman), as a pedagogic material it is so-so. Good as a reference though. – SWIM S. Apr 12 '21 at 23:58
28

I am no statistician, and I haven't read that much on the topic, but perhaps

Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century

should be mentioned? It is no textbook, but still worth reading.

Vivi
  • 1,241
  • 2
  • 14
  • 20
  • 1
    I second this. Also, there is quite a lot of suggestions for further reading which I think are useful in the book. – Chris Beeley Mar 17 '11 at 08:55
  • 1
    I think this book speaks to those who knew nothing at the beginning but the obtuseness of the language and the cultural baggage associated with the field. This book gave the mind wings - it says that statistics is about finding useful truth in a sea of noise and misunderstanding. – EngrStudent Oct 22 '14 at 12:53
  • 2
    Many people have reported this as entertaining, but it's full of extraordinary errors. If you can find it, my review in _Biometrics_ 57: 1273-1274 (2001) gives a far from complete list. (Salsburg gets various Bernoullis mixed up, which is easier to do.) – Nick Cox Jan 16 '16 at 16:07
22

Probability Theory: The Logic of Science

  • This book is tough. It is about the foundations of probability, and even in that part of Statistics, I don't think it is a reference text. I do believe there can be 14 people on planet Earth who read and understood its full message, but I would probably classify this as a must read for probabilists, for the sake of the thousands of others that are deep in stuff like GLMs, GAMs, Bayesian models and other things. – means-to-meaning Nov 06 '13 at 00:57
  • 1
    It is also a bit sad that some of the later chapters are missing and/or under developed - for example there is no chapter on regression, but a draft unpublished manuscript was available with some fascinating insights into "measurement error" regressions. Some very cool stuff on time series though. – probabilityislogic Apr 09 '14 at 07:53
22

Darrell Huff -- How to Lie with Statistics

Rob Hyndman
  • 51,928
  • 23
  • 126
  • 178
mkolar
  • 457
  • 1
  • 3
  • 15
  • 6
    Back when this was a \$3.95 and then a \$4.95 paperback, I bought copies by the dozen and gave them away to friends, clients, and anyone else who might be interested. – whuber Sep 09 '10 at 22:54
  • It's deservedly remembered. But the non-statistical content dates it unfortunately, not least an extraordinarily large fraction of cartoons featuring people (and even babies) smoking. 60+ years on, that's not amusing any more. (Some reprints e.g. one in the UK updated the cartoons.) – Nick Cox Jan 16 '16 at 16:02
14

Not a book, but I recently discovered an article by Jacob Cohen in American Psychologist entitled "Things I have learned (so far)." It's available as a pdf here.

Jarle Tufto
  • 7,989
  • 1
  • 20
  • 36
Freya Harrison
  • 3,212
  • 4
  • 25
  • 31
12

I wouldn't argue that either of these should be considered "the most influential book... [for] statistician[s]", but for those who are just starting to learn about the topic, two helpful books are:

  1. Robert Abelson, Statistics as Principled Argument
  2. Paul Murrell, Introduction to Data Technologies
gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
12

Long ago, Jack Kiefer's little monograph "Introduction to Statistical Inference" peeled away the mystery of a great deal of classical statistics and helped me get started with the rest of the literature. I still refer to it and warmly recommend it to strong students in second-year stats courses.

Rob Hyndman
  • 51,928
  • 23
  • 126
  • 178
whuber
  • 281,159
  • 54
  • 637
  • 1,101
11

William Cleveland's book "The Elements of Graphing Data" or his book "Visualizing Data"

  • 1
    I'm currently reading through The Elements (Visualizing Data is not in my current schools library). What is the difference between Elements & Visualizing Data? I haven't been able to find detailed enough descriptions to formulate what is exactly the difference between the two. – Andy W Nov 10 '11 at 19:36
  • 2
    I agree. I think that, for statisticians, Cleveland is better than Tufte. – Peter Flom Jan 22 '12 at 12:41
  • 3
    +1 to Robert Alberts, & +1 to Peter Flom (Cleveland's books are *definitely* better for statisticians, although Tufte's are beautiful as well, and I have read all of them). @AndyW, *Elements* is introductory, e.g., it has guidelines for making an informative graphic. *Visualizing* demonstrates how to center your data exploration process around graphics; it starts with preliminary visualization of the data, talks about the issues at hand and walks all the way through to assessing the final model (e.g., residual analysis) via graphics. The latter is much more informative than the former. – gung - Reinstate Monica Jan 23 '12 at 04:03
  • @AndyW One of them is a bit more technical than the other (I forget which is which though!) – Peter Flom Oct 02 '12 at 22:48
  • 1
    As @gung says, _Visualizing_ is a more advanced sequel to _Elements_. There is some overlap but it's helpful rather than irritating. Both strongly recommended. Last revision dates 1993 and 1994, but they are still fresh 20+ years later. Note that non-technical readers would get value from both: I can vouch personally that high school mathematics is sufficient background. – Nick Cox Jan 16 '16 at 15:59
11

I think every statistician should read Stigler's The History of Statistics: The Measurement of Uncertainty before 1900

It is beautifully written, thorough and it isn't a historian's perspective but a mathematician's, hence it doesn't avoid the technical details.

Rob Hyndman
  • 51,928
  • 23
  • 126
  • 178
Graham Cookson
  • 7,543
  • 6
  • 41
  • 35
8

I say the visual display of quantitative information by Tufte, and Freakonomics for something fun.

Stephen Turner
  • 4,183
  • 8
  • 27
  • 33
8

Andrew Gelman's interesting book recommendations are here:

http://thebrowser.com/interviews/andrew-gelman-on-statistics

Michael Bishop
  • 2,171
  • 3
  • 21
  • 31
7

In addition to "The History of Statistics" suggested by Graham, another Stigler book worth reading is

Statistics on the Table: The History of Statistical Concepts and Methods

Vivi
  • 1,241
  • 2
  • 14
  • 20
6

On the math/foundations side: Harald Cramér's Mathematical Methods of Statistics.

ars
  • 12,160
  • 1
  • 36
  • 54
  • By the way, this is the earliest place I have found mention of Cramer's phi. Amazing how a lovely little sidenote in that book became a well known method many decades later. – Tal Galili Jan 05 '13 at 22:56
5

For a clear exposition of what should be in social science journal articles (assistance if you're writing or peer reviewing) I like The Reviewer's Guide to Quantitative Methods in the Social Sciences. In particular I like the table desideratra as a synopsis of the minimum that a paper (article, thesis, dissertation) should contain. The chapters are separated by analysis technique, which is nice. I think the book has wider applications than "just" the social sciences as the techniques covered are used across many fields.

Quite early on, so perhaps not covered by the question, I was introduced to Ott's Introduction to Statistical Methods and Data Analysis. It's quite expensive, but is a wonderful resource at showing the underlying statistical models for various GLM methods. I dream of the day that journals require articles to contain show the formula of the statistical model tested.

For checking test assumptions, looking at the effects of various options within a test, and so forth, this is the one book I wish I had when I was studying. I have the previous edition and it is one of the best general resources I have purchased because of the clear and consistent manner in which information about the tests is laid out. It contains nice examples illustrating the test(s), and does not require the reader to have a particular statistical package in order to follow the expositions.

Michelle
  • 3,640
  • 1
  • 23
  • 33
4

I have read the above recommendations and was surprised to find that most of the people who answered the question were people who are not statisticians themselves. With 2 or 3 exceptions ... As an industrial statistician who also happened to work with social scientists and health professionals I would say that if I could take only one book with me to a desert island it would be George E.P Box, Statistics for Experimenters (Wiley). In his inimitable humorous and lucid style he explains the essence and the philosophy of building mathematical models for real data. Rigorous thinking, no mathematical frivolities, no nonsense, teaches us to think statistically, plot and visualize whatever you can. A masterpiece of a competent applied scientist (chemical engineer turned statistician). Always fun to read again.

Scortchi - Reinstate Monica
  • 27,560
  • 8
  • 81
  • 248
j.h.
  • 1
  • 2
4

Fooled By Randomness by Taleb

Taleb is a professor at Columbia and an options trader. He made about $800 million dollars in 2008 betting against the market. He also wrote Black Swan. He discusses the absurdity of using the normal distribution to model markets, and philosophizes on our ability to use induction.

Neil McGuigan
  • 9,292
  • 13
  • 54
  • 62
4
  1. Michael Oakes' Statistical Inference: A Commentary for the Social and Behavioral Sciences.
  2. Elazar Pedhazur's Multiple Regression in Behavioral Research. If you can stand the immense detail and the self-important tone.

In case you're interested, I've reviewed both on Amazon and at https://yellowbrickstats.com/favorites.htm

rolando2
  • 11,645
  • 1
  • 39
  • 60
3

Lots of good books already suggested. But here is another: Gerd Gigerenzer's "Reckoning With Risk" because understanding how statistics affect decisions is more important than getting all the theory right. In fact number one sin of statisticians is failing to communicate clearly. His book talks about the consequences of poor communication and how to avoid it.

matt_black
  • 101
  • 3
  • 1
    _"understanding how statistics affect decisions is more important than getting all the theory right..."_ Ain't it the truth? I come from an architecture background, and I can tell you, sometimes theory just gets in the way... – naught101 Feb 29 '12 at 13:54
3

Rice: Mathematical Statistics and Data Analysis

Andrej
  • 2,131
  • 2
  • 18
  • 26
2

I am going to go ahead and propose a standard textbook in the field. I am talking about Probability and Statistics by DeGroot and Schervish, first published in 1975.

This book has served as a textbook for many students and is considered a classic, rightfully so in my opinion. It covers topics such as combinatorics, distributions, Bayesian statistics, likelihood inference and regression analysis. As far as I know no other textbook is so thorough so I believe it is a must-have.

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
JohnK
  • 18,298
  • 10
  • 60
  • 103
2

I learned a great deal from the Bible of Bayesian statistics:

Jose Bernardo and Adrian Smith (2000) Bayesian Theory.

Ben
  • 91,027
  • 3
  • 150
  • 376
2

It would probably be Bayesian Data Analysis by Gelman or Deep Learning with Python. But that's a bit like taking streptomycin to the middle ages. These were not written when I started my career and quite a few things from the books would have been big news back then. Some of the most influential things everyone should know are in no single source though (perhaps they should be, but...).

Björn
  • 21,227
  • 2
  • 26
  • 65
1

The Essential Guide to Effect Sizes: Statistical Power, Meta-Analysis, and the Interpretation of Research Results by Paul D. Ellis

This book if a "must have" for everyone conducting any scientific research, especially one that comes not from pure stats/maths. The book below extends the first one regarding confidence intervals.

Understanding The New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis by Geoff Cumming

Adam Przedniczek
  • 1,082
  • 2
  • 10
  • 24
1

"Most influential" is a very different notion from "everyone should read". I am not qualified to answer the first - you'd need someone who is an historian of statistics - but for the second, here are some:

  1. Statistics as Principled Argument by Robert Abelson should be read by anyone doing or using statistics in the pursuit of science, humanities, etc.

  2. William S. Cleveland's two books on graphics: The elements of graphing data and Visualizing Data. For statisticians, I'd put these ahead of even Tufte's work, not because Tufte isn't worthwhile but because a) Cleveland wrote with statisticians as his intended audience and b) Cleveland based his recommendations on experimental data about how people look at graphs, rather than intuition.

  3. Exploratory Data Analysis by John Tukey. It's dated but valuable - you can do a lot with a pencil and paper and a brain (at least, if your brain is as good as Tukey's!)

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
Peter Flom
  • 94,055
  • 35
  • 143
  • 276
1

Kennedy's A Guide to Econometrics contains a wealth of practical advice about a wide range of statistical analysis. It's somehow both incredibly information-dense and easy to read, and I still learn something new every time I pick it up.

Wooldridge's Introductory Econometrics has a good amount of this kind of discussion too, but as an introductory textbook it is more self-contained. I wish I'd had a course based around it.

The Laconic
  • 1,454
  • 2
  • 10
  • 18