Statistical Learning. Contradictions?

Question

Currently I am re-reading some chapters of: An Introduction to Statistical Learning with Applications in R by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani (Springer, 2015). Now, I have some doubts about what is said there.

Above all it seems to me relevant to note that in chapter 2 two concepts are introduced : prediction accuracy-model interpretability tradeoff and bias-variance tradeoff. I mentioned the latter in an earlier question.

In this book, it is suggested that focusing on expected prediction error (test MSE) yields the following assertions:

less flexible specifications imply more bias but less variance
more flexible specifications imply less bias but more variance

It follows that linear regression implies more bias but less variance. The optimum in the tradeoff between bias and variance, the minimum in test MSE, depends on the true form of $f()$ [in $Y = f(X) + \epsilon$]. Sometimes linear regression works better than more flexible alternatives and sometimes not. This graph tells this story:

In the second case linear regression works quite well, in the others two not so much. All is ok in this perspective.

In my opinion the problem appears under the perspective of inference and interpretability used in this book. In fact this book also suggests that:

less flexible specifications are more far away from reality, then more biased, but at the same time they are more tractable and, then, more interpretable;
more flexible specifications are closer to the reality, hence less biased, but at the same time they are less tractable and, then, less interpretable.

As a result we have that linear regressions, OLS and even more LASSO, are the most interpretable and more powerful for inference. This graph tell this story:

This seems to me like a contradiction. How is possible that linear models are, at the same time, the more biased but the best for inference? And among linear models, how is possible that LASSO regression is better than OLS one for inference?

EDIT: My question can summarized as:

linear estimated model are indicated as the more interpretable even if the more biased.
linear estimated model are indicated as the more reliable for inference even if the more biased.

I read carefully the answer and comments of Tim. However it seems to me that some problems remain. So, actually it looks like in some sense the first condition can hold, i.e. in a sense where “interpretability” is a property of the estimated model itself (its relation with something "outside" are not considered).

About inference "outside" is the core, but the problem can move around its precise meaning. Then, I checked the definition that Tim suggested (What is the definition of Inference?), also here (https://en.wikipedia.org/wiki/Statistical_inference), and elsewhere. Some definition are quite general but in most material that I have inference is intended as something like: from sample say something about the "true model", regardless of his deep meaning. So, the Authors of the book under consideration used something like the “true model”, implying we cannot skip it. Now, any biased estimator cannot say something right about the true model and/or its parameters, even asymptotically. Unbiasedness/consistency (difference irrelevant here) is the main requirements for any model written for pure inference goal. Therefore the second condition cannot hold, and the contradiction remains.

No model in statistics/ML estimates the "true model". Our models always *approximate* the true model, bias and variance are ways of measuring the problems with this approximation. — Tim, Oct 27 '20 at 09:01
I agree. However inference move around the true model not its approximation. If the model is severily missspecified (bad approximated) inference from it cannot be better than one that come from a better specified one. — markowitz, Oct 27 '20 at 10:10
Yes, but more biased (underfitting) model is not any worse (it it isn't better) for inference, than more varying (overfitting). — Tim, Oct 27 '20 at 10:26
In my view under/overfitting are concept useful for prediction more than inference. Between prediction and inference (=explanation about real world) something can be shared, but these area must be clearly separated. This point is precisely addressed in this article (https://www.stat.berkeley.edu/~aldous/157/Papers/shmueli.pdf#page=11&zoom=auto,-95,590) that I have already mentioned in comments. — markowitz, Oct 27 '20 at 13:13
In it is also shown as more biased/misspecified model can perform better than correct one (or less biased) if the prediction is the goal. This is the heart of bias-variance tradeoff, and as I underscored elsewhere this point, unfortunately, is forget sometimes. However this tradeoff don't play important role if the goal is pure explanation/inference; in it bias minimization is the core. Therefore say that among several alternatives the more biased model is better for inference … is a wrong message. This is my main point. — markowitz, Oct 27 '20 at 13:13
*Nobody* says that "more biased model is better for inference". Simple model is better for inference, but simple model has also good chance of being biased as well. Complicated model, that you can't understand, is useless for inference. See also https://stats.stackexchange.com/questions/207760/when-is-a-biased-estimator-preferable-to-unbiased-one — Tim, Oct 27 '20 at 13:25
I read the discussion that you suggest. It is useful but it do not add much on what I already know about bias-variance tradeoff. Its core message is about test MSE minimization (prediction). Citing me you forget few important words, I write: “among several alternatives the more biased model is better for inference “. You: “Simple model is better for inference, but simple model has also good chance of being biased as well.”. Them sound like quite similar, and both seems me wrong. — markowitz, Oct 27 '20 at 15:19
Maybe the fact that Authors are focused on functional form only do not help us here. But we can focus even on linear models only. It seems that Authors suggest that LASSO regression is better than OLS one for inference. Now, at least under standard assumptions, this is wrong. — markowitz, Oct 27 '20 at 15:20

Tim · Answer 1 · 2020-10-27T13:26:23.307

10

There’s no contradiction. The fact that something is easy to interpret has nothing to do with how accurate is it. The most interpretable model you could imagine is to predict constant, independently of the data. In such case, you would always be able to explain why your model made the prediction it made, but the predictions would be horrible.

That said, it’s not the case that you need complicated, black-box models if you want accurate results and poorly performing models for interpretability. Here you can find nice, popular article by Cynthia Rudin and Joanna Radin, where they give example of interpretable models giving very good results and use it to discuss how performance vs interpretability is a false dichotomy. There’s also very interesting episode of Data Skeptic podcast on this subject hosting Cynthia Rudin.

You may be interested also in the When is a biased estimator preferable to unbiased one? thread.

edited Oct 27 '20 at 13:26

answered Oct 25 '20 at 17:52

Tim

108,699
20
212
390

Thanks for answer and links. Useful. However I’m worry about to not misunderstand what you, and the references you suggest, said. The doubts for me move around the words: interpretability, explanation, inference. You, and the suggested article of Rudin and Radin, never speak about “inference” but in my question I use it not by chance. You don’t use it by chance or because consider inference and interpretability not like synonyms? – markowitz Oct 26 '20 at 08:14
You can forget for a moment the figure 2.7 and can consider that Authors of book suggest that linear model are, at the same time, the more reliable for inference and the more biased. In other words you can put something like “reliability for inference” in the horizontal axis of the figure 2.7. If you don't trust me I can report some phrase about it. Now if we use the word “interpretability” in place of something like simplicity (no more), I agree with you. In this sense the constant model that you suggest, regardless his prediction performance, is easily interpretable but bad for inference. – markowitz Oct 26 '20 at 08:15
At the other side, if you affirm that constant model is useful for inference too, I disagree …. or at least we have to go deeper in the meaning of “inference”. – markowitz Oct 26 '20 at 08:16
Now, in the article of Rudin and Radin word like "explanation" is used many times. In my knowledge it is a synonym of causality. So, if in the horizontal axis of the figure 2.7 we put something like “reliability for causal inference” the problems become more and deeper and would be far from solved by your example and references you cited. – markowitz Oct 26 '20 at 08:16
2

@markowitz "explainability" has *nothing* to do with causality. Explainability is about our ability to explain how did some algorithm made some prediction. This is complicated for something like deep learning, since there's a lot of moving parts and simple in case of linear regression, or decision tree. – Tim Oct 26 '20 at 08:20
As about "inference" vs "interpretability", your question and how you summarize what you've read since to treat those as synonyms. Inference means [drawing conclusion from the data](https://stats.stackexchange.com/questions/234913/what-is-the-definition-of-inference). With complicated, black-box models, they enable you to predict something, but you don't know why did they made the prediction they made. For simple models, you know why, so you can use this knowledge to learn that some things are related to each other (inferring correlation etc). – Tim Oct 26 '20 at 08:22
About: “"explainability" has nothing to do with causality. Explainability is about our ability to explain how did some algorithm made some prediction” It seems me a point of view. Some people intend “explanation” of the data/measure/functions as understand where data and results come from and what substantive meaning stand behind the always used concept of “relation”. – markowitz Oct 26 '20 at 09:23
For example in the article about the dispute between prediction and explanation, discussed here (https://stats.stackexchange.com/questions/342360/minimizing-bias-in-explanatory-modeling-why-galit-shmuelis-to-explain-or-to/486140#486140) the word explanation is indubitably used like synonym of causation. However we can stop here about this point, at most I can open another more focused question later. – markowitz Oct 26 '20 at 09:23
About "inference" vs "interpretability", if as you suggest, in multivariate context like our, the correct meaning of inference is about correlations/associations and no more … maybe problems come to solve. However remain the problem that simpler model, like linear, are indicated as more biased also. This fact seems me contradict the main principle of inference. I'm wrong? – markowitz Oct 26 '20 at 09:24
@markowitz as said in the answer, the fact that something can used for inference has nothing to do with how "good" it is. You are free to use both very good and very crappy models for inference. As also said, it is not true that the less interpretable models need to be less "good". Moreover, the fact that the model is biased, does not mean that it is "bad", there are cases where we prefer models that are more biased. – Tim Oct 26 '20 at 13:04
You said “Moreover, the fact that the model is biased, does not mean that it is "bad", there are cases where we prefer models that are more biased”. Yes, and the bias-variance tradeoff tell us why it is true. But this hold for prediction goal. – markowitz Oct 26 '20 at 17:45
In order to put away misunderstandings, let me clarify. My question move around potential ambiguities, therefore terminology matters a lot. “good” and “bad” depend for what. The potential contradiction that I invoke is, at first round, about “bias” vs “interpretability”. You, and Rudin and Radin paper, reply that biased model can be good for interpretation. I understand correctly? If so, the potential contradiction that I invoke can be rewrite as about “bias” vs “inference”. You, from comments, reply that biased model can be good for inference too. I understand correctly? – markowitz Oct 26 '20 at 17:45
@markowitz every model can be characterized by their bias and variance. "Inference" is not a feature of a model, but how the model is used. For some models it would be harder to use them for inference (learning something about the world) because we can hardly interpret how they come into the predictions, so this does not help us with understanding the data. Linear regression is a simple model, that gives simple, easy to understand (interpretable) explanations, for the same reason it is biased. – Tim Oct 26 '20 at 20:19
1

I reread carefully all you said. Thanks, but I'm not convinced yet. I added something in my question. Something that, I think, can help other readers also. Comments are not the right place for long discussion. – markowitz Oct 27 '20 at 08:19

Statistical Learning. Contradictions?

1 Answers1

Linked