What do statisticians do that can't be automated?

Question

Will software eventually make statisticians obsolete? What is done that can't be programmed into a computer?

the same question can be asked for programmers in general then ;) — nb1, Feb 10 '12 at 08:32
Reading just 1 or 2 thoughtful articles by the likes of Jacob Cohen, John Tukey, Howard Wainer, Gerd Gigerenzer, Gary King, or Donald Rubin will dispel any need to ask this question. — rolando2, Feb 10 '12 at 09:07
We design studies, and in particular we have to deal with real-life problems where the statistically "best" design can't be implemented. We clean up dirty data, bringing real-world knowledge to bear. We also interpret results in plain [insert language of choice]. — Michelle, Feb 10 '12 at 09:11
A physician came to me for help with an advanced longitudinal data analysis declaring "I know what I'm doing, I just want confirmation that this is correct." His use of a random effects model was flawless except for one simple fact: He had extreme values in the the response variable that were badly tilting the results. This was caused by his taking logs to deal with "high outliers." With many tiny values in the original responses, the log transformation blew up, creating "low outliers" instead. Statisticians deal with many things, including reexamining implicit assumptions. — Frank Harrell, Feb 10 '12 at 13:39
Computers can not determine what is right or wrong (more accurately, better or worse), they just do it faster. The consequences of doing the wrong thing very quickly and *en masse* are not good... — James, Feb 10 '12 at 13:39
I recently learned that in general, optimization is area that often requires solutions that are problem-specific. This often requires human input and cannot always be automated. — Phillip Cloud, Feb 10 '12 at 18:35

score 34 · Answer 1 · answered Feb 10 '12 at 15:09

Computers will only make statisticians obsolete when strong AI makes humans as a whole obsolete.

The question reminds me of the question about, "If there are all of these robust statistical methods, why do people still use other methods?" Some of the answer is habit and training, but much of it is that the question is naive: "robust" doesn't mean "you don't have to think about and understand what you're doing", as the question implies.

I mean, you could download the R statistics package today, and be doing any basic statistical technique by nightfall. You could then download a couple of packages and start using methods so esoteric that most of us haven't even heard of them. The question is: would you get reasonable answers? The answer is: probably not.

The algorithms are automated, but you still have to make many judgement calls all along the investigative path: from the plan of attack to the final judgement of whether the results actually make sense. To get to that point, you're really talking about Star-Trek-like computers where you can say, "Computer, tell me...", by which point pretty much every human vocation is obsolete.

+1 for "Computers will only make statisticians obsolete when strong AI makes humans as a whole obsolete." — Macro, Feb 10 '12 at 15:27

score 29 · Accepted Answer · answered Feb 10 '12 at 13:43

@Adam, if you think of statistical researchers analogously to those in other fields - people who build upon the existing methodology and knowledge - then it might make it more clear that the answer to your first question is 'No'.

Statisticians that make a living from simply applying canned software packages could quite possibly be replaced by computers for every step except writing the discussion section of a paper where the results must be interpreted. So, in that sense, yes - it could be automated (although it would have to be a complicated piece of software that has one hell of a natural language processor).

However, as most researchers eventually figure out, the "canned" routines that people often use are pretty limited and must be modified (or new methods entirely must be developed) to answer specialized research questions - this is where the human aspect of statistics is indispensable. Or, a researcher must simply settle for a somewhat different, but related, research question that can be answered using classical methods.

Most statisticians I know work in research jobs (e.g. professors, research scientists) where their primary role is to develop new methodology. If this process could be automated, meaning that a computer can formulate and crank out useful new methodology, then I'm afraid researchers in every field would be obsolete.

I think your second paragraph misses a point: it's not just the end of the process (result interpretation) that's hard, it's also the beginning - understanding what methods to apply to the data in what ways, which in the general case requires understanding the nature of the data and the system it came from. — Cascabel, Feb 10 '12 at 23:19
@Jefromi, like I commented to someone below, I think that understanding comes from an expert in the field of application, not a statistician. — Macro, Feb 11 '12 at 01:24
If understanding just "came from" experts in the field of application my job'd be much easier (& much less fun). There's a frame problem: something the expert doesn't think to say can be important for the statistical analysis. In practice the most fruitful collaborations result in the expert learning a fair amount of statistics & the statistician learning a fair amount about the field of application. — Scortchi - Reinstate Monica, Jan 12 '17 at 21:10

score 10 · Answer 3 · answered Feb 10 '12 at 18:19

What can a statistician do that a computer can't? Write the original program they get replaced by.

Beyond that somewhat silly answer, the root of the question is ignoring the actual science of statistics in favor of its mechanics, and entirely discounting the role of the creative process in statistical analysis. This is, to use Peter Flom's car example, like saying cars are built using rivets and welds, so there's no reason the new Mustang couldn't be designed by riveting and welding robots.

A tremendous amount of the doing of statistics involves subject-matter expertise, judgement calls, and creativity. "Canned" analysis running from an algorithm often won't get you the best answer, and there are myriad documented examples where using automated methods actually gives you the wrong answer - or at least not the answer you think you're getting. The use of stepwise p-value based variable selection procedures and analysis based on purely numerically defined quantiles are two I'm most familiar with, but I'm sure you can find a wealth of others out there.

Even if all that was still somehow automated, there is the matter of interpreting results. The statistician (or statistically-inclined scientist)'s job isn't done when you obtain a regression coefficient or p-value. What does that finding mean. What are the caveats? What does this represent in the context of what's come before?

Finally, you have the development of new methods. Statistics isn't something that was simply laid out long ago by people whose names we recognize - Fisher, Cox, etc. It's an evolving field, and you can't program a new method into a computer until a person develops the method itself.

(+1) because "Canned analysis running from an algorithm often won't get you the best answer" is very true. This doesn't mean that human practitioners of statistics don't do this all of the time. (Note: most practitioners of statistics are NOT statisticians... more like people who are using statistics despite not really knowing what they're doing, often resulting in bad science) — Macro, Feb 10 '12 at 18:23

score 10 · Answer 4 · answered Feb 13 '12 at 18:27

10

Another way to interpret this question might be: "has the rapid increase in automated statistical techniques in recent years corresponded with a decreased demand in jobs for dedicated statisticians and data analysts?"

We can address this question by looking at the data job market for data analysis positions
enter image description here

Data courtesy of indeed.com & revolutions blog

answered Feb 13 '12 at 18:27

cboettig

236
2
10

+1 Even Indeed.com has not made @cboettig obselete. – Thomas Levine Feb 14 '12 at 19:38
4

I'm not convinced "demand in jobs for dedicated statisticians and data analysts" has strong correlation with the usage of the keywords "data scientist" or "big data" in job ads. – Darren Cook Feb 17 '12 at 00:26
@DarrenCook well said! – cboettig Feb 17 '12 at 20:31

score 7 · Answer 5 · answered Feb 10 '12 at 09:17

I don't entirely agree with the premise of the question, i.e. I think there is no way in which computers could ever hope to replace statisticians, but to put a concrete example to why I think that:

The work which statisticians do with scientists, particularly, in the design and interpretation of experiments, requires not only a human mind but even a philosophical bent which it is inconceivable that computers could ever show.

Unless we end up in some sort of Skynet type situation, of course, in which case I reckon all bets are probably off as far as the future of all humanity, never mind about just the statisticians, is concerned :-)

Except I have feline overlords to obey. :) – Michelle Feb 10 '12 at 20:52 — Michelle, Feb 10 '12 at 20:52

score 5 · Answer 6 · answered Feb 10 '12 at 13:25

The question suggests a naive view of a statistician-—that it's all about checking to see if a p < 0.05 and reporting some numbers and standard graphs. If that's what you mean by statistician then you are correct in your implication that much of it could be entirely automated. But that's not what statistician means.

Define your term statistician though, and you might get better answers.

Arne Jonas Warnke · Answer 7 · 2016-06-18T07:35:17.877

Academic studies which look at the probability of automation of different occupations or task do not think that statisticians will be soon substituted by computers. See for example the controversial Frey & Osborne (2013) study which ranks occupations according to their probability of computerization, statisticians are ranked low 213 out of 702 with a probability of 22% (see table in the appendix). If you are further interested, see also the Slate article here.

Arntz et al. (2016) (here an The Economist article) look at tasks rather than occupations for the European Union and come to a similar conclusion: Doing "Complex Math or Statistics" is statistically significantly negatively related to job automatibilty (see Table 3).

But some caution is advisable, academics and/or economists have not always been very good in predicting the future (the Nobel laureate Robert Lucas for example concluded in 2003, a few years before the financial crises, that the "central problem of depression prevention as been solved, for all practical purposes, and has in fact been solved for many decades."). Both studies appear to be working paper, which are widely discussed but have not been published in standard peer-reviewed journals.

Regarding the academic debate, here you can find an overview article about the state of research about automation.

score 3 · Answer 8 · answered Feb 10 '12 at 17:53

3

Loading a statistics package onto your computer doesn't make you a statistician any more than buying a car makes you able to drive.

Even if the statistician just applies "canned" routines there are lots of questions.

Which routine? What routine will answer the client's questions?
With what variables? and should they be transformed? Should some levels be combined? Which should be forced into a model?
With what data? Should outliers be deleted? Trimmed? Maybe a robust method?

and so on.

But the job starts way before the computer is turned on, and ends long after the statistical package is turned off.

Before: What does the client want to do? Often this is a lot of work! What data does the client have? Oy vey! The variables are labeled V1 to V828171 Which are which? What is the state of the literature? What will the client expect? How technical should it be?

After: What do results mean? (and not just "this means that the regression is significant") How should the results be explained to the client? What other questions do the results raise?

It will, I think, be a long time before computers can do this.

answered Feb 10 '12 at 17:53

Peter Flom

94,055
35
143
276

1

In order for you to answer the questions listed in (1), (2) and (3), you go through some logical process. Theoretically, this logical process could be coded into a computer program. If the computer had a perfect natural language processor and the software contained all "canned" software, and had the logic mentioned above programmed in, it would be able to do these things. Or, are you saying, it's not exactly a logical process? – Macro Feb 10 '12 at 17:59
4

For me, the analogy is a little closer to "buying a car does not make you a mechanic or a car designer." – cardinal Feb 10 '12 at 18:04
1

@Macro Because its a logical process doesn't necessarily mean if can be programmed into a computer. "Should some levels be combined" isn't always a numeric measurement - it requires considering if those combined levels make sense in the context of the variable itself, for example. – Fomite Feb 10 '12 at 18:08
1

Deciding whether it makes sense in the context of the application isn't a question for a statistician either - it's a question for an expert in whatever the application is. A statistician can tell you if it's justifiable to combine levels based on whether or not they appear homogeneous, which could certainly be taught to a computer. – Macro Feb 10 '12 at 18:10
4

I cannot resist pointing out that Google has been making great advances in the direction where buying a car *will* make you able to drive--it will do so automatically! – whuber Feb 10 '12 at 19:11
@whuber: I would say Sebastian Thrun and his team have. Their work started quite long ago at CMU and they and others made great strides from the US government funded DARPA challenges. Google is now bankrolling some of their work. :) – cardinal Feb 10 '12 at 20:37
1

@macro 1, 2, and 3 are not an entirely logical process. They require intuition. Also, I don't think it is even possible to have a perfect natural language processor, either in humans or computers. It would be very hard to program a computer to figure out when it had misunderstood someone. – Peter Flom Feb 10 '12 at 20:41
@whuber: as long as it is not raining heavily. Google cars might work in California, I would like seeing one working in northern norway in snowstorm (we have that each year). – kjetil b halvorsen Oct 20 '15 at 13:46
@whuber A little late to the party, and perhaps a bit pedantic, but wouldn't that be closer to *riding* a car? – Frans Rodenburg Oct 31 '19 at 04:50

score 0 · Answer 9 · answered Apr 12 '18 at 14:55

0

I think that AI will only make statisticians smarter and more competitive. Why? Because this is the intent of artificial intelligence since their conception many decades ago...

answered Apr 12 '18 at 14:55

user22478

135
9

What do statisticians do that can't be automated?

9 Answers9

Linked