Is prediction the 'golden criterion' to judge the ability of statisticans?

Question

I was reading Faraway's textbook linear models with R (1st edition) last weekend. Faraway had a chapter called "Statistical Strategy and Model Uncertainty". He described (page 158) that he artificially generated some data using a very complicated model, then he asked his students to model the data and compare the students' predicted results vs read results. Unfortunately, most students over-fitted the testing data and gave predicted values totally off the mark. To explain this phenomenon, he wrote something very impressive to me:

" The reason the models were so different was that students applied the various methods in different orders. Some did variable selection before transformation and others, the reverse. Some repeated a method after the model was changed and others did not. I went over the strategies that several of the students used and could not find anything clearly wrong with what they had done. One student made a mistake in computing his or her predicted values, but there was nothing obviously wrong in the remainder. The performance on this assignment did not show any relationship with that in the exams. "

I was educated that the model prediction accuracy is the 'golden criterion' for us to select the best model performance. If I am not mistaken, this is also the popular method used in Kaggle competitions. But here Faraway observed something of a different nature, that the model prediction performance could have nothing to do with the ability of the statistican involved. In other words, whether we can build the best model in terms of predictive power is not really determined by how experienced we are. Instead it is determined by a huge 'model uncertainty' (blind luck?). My question is: is this true in real life data analysis as well? Or was I confused with something very basic? Because if this is true, then the implication to real data analysis is immense: without knowing the "real model" behind the data, there is no essential difference between the work done by experienced/inexperienced statisticans: both are just wild guesses in front of the training data available.

+1 nice question. To offer another angle, say one of the analysts knows the true mode - then her predictions may be bad too! So even with knowing the real model, you'd see this. Important may be Haggerty and Srivinasans 1991 observation in Psychometrika that "the practice [...] of concluding that a model with higher predictive accuracy is "truer" is not a valid inference". — Momo, Dec 08 '15 at 10:37
I've not looked at the book yet, but "variable selection" & "transformation" already ring warning bells. See [Algorithms for automatic model selection](http://stats.stackexchange.com/q/20836/17230) & [Nature of the Relationship between Predictors and Dependent in Regression](http://stats.stackexchange.com/q/125544/17230). I also wouldn't conflate the exam performance of Statistics students with the real-work ability of statisticians. — Scortchi - Reinstate Monica, Dec 08 '15 at 11:05
This information given by Faraway seems awfully anecdotal to be used as the basis for a sweeping general principle about the field of statistics. I wouldn't want to construct a model about predictive modeling based on such non-reproducible examples. It's also possible that they were, wittingly or not, cherry-picked. — rolando2, Dec 08 '15 at 11:20
One logically valid conclusion that can be derived from this anecdote is that none of Faraway's students had (yet) acquired skills needed to perform well on his prediction test. It is difficult to make any connection at all between that result and your speculations about how experienced statisticians might perform. — whuber, Dec 08 '15 at 14:25
@whuber: I do not think that way. I agree 28 students are a bit small, but I do think this real observation has some serious implications. If Faraway made the real model, and he went over with several students' work, could not find any serious mistake, yet the predictions are way-off from what they should be. Then this says something about the 'model uncertainty' being involved, that one at least need the work done by a separate analyst to compare the differences, no matter how 'experienced' the original analyst is. I think this is quite alarming to me. — Bombyx mori, Dec 08 '15 at 20:26
@whuber: I quote Faraway: 'The implications for statistical practice are serious. Often a dataset is analyzed by a single analyst who comes up with a single model. Predictions and inferences are based on this single model. The analyst may be unaware that the data support quite different models which may lead to very different conclusions. Clearly one will not always have a stable of 28 independent analysts to search for alternatives, but it does point to the value of a second or third independent analysis. '. — Bombyx mori, Dec 08 '15 at 20:27
@rolando2: This could be true - the whole chapter disappeared in the second edition, perhaps because Faraway think it is not appropriate anymore. But since this is a rare real-life observation, I ask at the risk of offering sweeping generalizations. — Bombyx mori, Dec 08 '15 at 20:29
Why would it be surprising that there wasn't a relationship between their performance on this challenge and exam scores? Exams in statistics courses are usually testing skills that are in many ways separate from model-building, like knowledge of relevant math concepts. If someone has a good feel for data analysis, do you expect this person to be a linear algebra wizard, or vice versa? — dsaxton, Dec 10 '15 at 21:29
@dsaxton: I agree you have a point if this is an ordinary math class. But from the way Faraway frame it, I think he was expecting his class to be applied oriented and catering towards actually data analysis. Hence the 'no correlation' is almost as a shock. But this is only my guess. — Bombyx mori, Dec 10 '15 at 23:50

score 1 · Answer 1 · answered Dec 09 '15 at 22:47

1

I asked the professor in my department on this. He said frankly he was not surprised about it at all. He suggested the following way to look at this: what Faraway did was only a one time experiment, and it is not surprising that the results appeared to have no correlation with the final grades. But if Faraway repeat his 'experiment' 100 times with the same group of students, he is sure that the students learned statistics better would be performing well, similar as the confidence interval. So in his opinion experience does matter, it is just a one time social experiment could not show it because of the model uncertainty.

answered Dec 09 '15 at 22:47

Bombyx mori

687
1
6
17

I find that excuse hilarious. I think this is the reason why statistics is being replaced by (or re-branded as) "data science". People are beginning to realise that statistics-as-taught-in-universities is not very good at prediction, and models with no predictive power are useless. – Flounderer Dec 10 '15 at 00:47
1

@Flounderer: I think this is not really an excuse, and what you wrote might not be really well connected to this case. First most of the time in real life one has both a testing set and a training set, unlike in Faraway's case there is only one training set available. Second if you take a look at Faraway's model, it is highly non-linear such that regression methods do not work very well. Hence all linear models are just wild guesses. The moral of the experiment is "all models are wrong" rather than "statistics-as-taught-in-universities is not very good at prediction". – Bombyx mori Dec 10 '15 at 01:46
@Flounderer: In other words, I believe if I (or anybody else in the forum) is in Faraway student's position twenty years ago facing this wierd training set, we are unlikely to do better using just linear models. I do not think this is something related to " statistics-as-taught-in-universities" at all. – Bombyx mori Dec 10 '15 at 01:48

score 1 · Answer 2 · answered Dec 10 '15 at 20:09

1

The students' models were almost all overfit. With n data points, one can always fit a perfect polynomial of order n-1. Such a model is overdue leaving nothing to random error. It appears the students have made similar overfittng errors, but presumably with different functions.

Overfitting is an error that should only be made by students. And this suggests experience and education are necessary qualifications for modelling.

answered Dec 10 '15 at 20:09

Alison weir

11
1

2

"Overfitting is an error that should only be made by students" is a pretty high standard to bear. Modeling is hard. Maybe something like "Overfitting is something modelers learn to recognize and avoid through experience and education"would be closer to the truth? – Matthew Drury Dec 11 '15 at 00:28

Is prediction the 'golden criterion' to judge the ability of statisticans?

2 Answers2