9

I have a general methodological question. It might have been answered before, but I am not able to locate the relevant thread. I will appreciate pointers to possible duplicates.

(Here is an excellent one, but with no answer. This is also similar in spirit, even with an answer, but the latter is too specific from my perspective. This is also close, discovered after posting the question.)


The theme is, how to do valid statistical inference when the model formulated before seeing the data fails to adequately describe the data generating process. The question is very general, but I will offer a particular example to illustrate the point. However, I expect the answers to focus on the general methodological question rather than nitpicking on the details of the particular example.


Consider a concrete example: in a time series setting, I assume the data generating process to be $$ y_t=\beta_0 + \beta_1 x_t+u_t \tag{1} $$ with $u_t \sim i.i.N(0,\sigma_u^2)$. I aim to test the subject-matter hypothesis that $\frac{dy}{dx}=1$. I cast this in terms of model $(1)$ to obtain a workable statistical counterpart of my subject-matter hypothesis, and this is $$ H_0\colon \ \beta_1=1. $$ So far, so good. But when I observe the data, I discover that the model does not adequately describe the data. Let us say, there is a linear trend, so that the true data generating process is $$ y_t=\gamma_0 + \gamma_1 x_t+\gamma_2 t + v_t \tag{2} $$ with $v_t \sim i.i.N(0,\sigma_v^2)$.

How can I do valid statistical inference on my subject-matter hypothesis $\frac{dy}{dx}=1$?

  • If I use the original model, its assumptions are violated and the estimator of $\beta_1$ does not have the nice distribution it otherwise would. Therefore, I cannot test the hypothesis using the $t$-test.

  • If, having seen the data, I switch from model $(1)$ to $(2)$ and change my statistical hypothesis from $H_0\colon \ \beta_1=1$ to $H'_0\colon \ \gamma_1=1$, model assumptions are satisfied and I get a well-behaved estimator of $\gamma_1$ and can test $H'_0$ with no difficulty using the $t$-test.
    However, the switch from $(1)$ to $(2)$ is informed by the data set on which I wish to test the hypothesis. This makes the estimator distribution (and thus also inference) conditional on the change in the underlying model, which is due to the observed data. Clearly, the introduction of such conditioning is not satisfactory.

Is there a good way out? (If not frequentist, then maybe some Bayesian alternative?)

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Richard Hardy
  • 54,375
  • 10
  • 95
  • 219
  • 3
    Your discomfort is endemic to classic approaches to awarding PhDs: careful hypothesis specification, followed by an empirical test and ending with descriptive causal inference. In this world, the short answer is, "no," there is no way out. However, the world is evolving away from that strict paradigm. For instance, in a paper in the *AER* last year titled *Prediction Policy Problems* by Kleinberg, et al, they make the case for data mining and prediction as useful tools in economic policy making, citing instances where "causal inference is not central, or even necessary." It's worth a look. – Mike Hunter Feb 24 '17 at 19:23
  • @DJohnson, Do I understand you correctly that the direct answer would be, there is no way out? If model $(1)$ does not happen to be the true data generating process but model $(2)$ is, we will not be able to test the subject-matter hypothesis using the data set at hand. Is that right? – Richard Hardy Feb 24 '17 at 19:50
  • 2
    In my view, the direct answer would have to be there is no way out. Otherwise, you would be guilty of the worst sort of data mining -- recasting the hypotheses to fit the data -- a capital offence in a strict, paradigmatic world. – Mike Hunter Feb 24 '17 at 19:55
  • @DJohnson, In terms of inference, I think this is not the worst but perhaps the lightest sort of data mining. I could, for example, test whether $\gamma_2=0$ and make it support my theory in some creative way :) That would probably be the worst. But that is beyong the point. – Richard Hardy Feb 24 '17 at 20:11
  • 3
    If I understand correctly, you are collecting data, then selecting a model and then testing hypotheses. I may be wrong, but it seems to me that the [selective inference](http://statweb.stanford.edu/~tibs/ftp/nips2015.pdf) paradigm investigated by Taylor and Tibshirani (among others) could be related to your problem. Otherwise, comments, answers and linked answers [to this question](http://stats.stackexchange.com/questions/254314/is-mle-estimation-asymptotically-normal-efficient-even-if-the-model-is-not-tru) might be of interest. – DeltaIV Mar 14 '17 at 10:01
  • Good question. Have you heard of "indirect inference"? Sounds very similar to what you're describing http://bactra.org/notebooks/indirect-inference.html – Momo Mar 14 '17 at 10:09
  • @Delta, you understood me correctly. But basically it is not "my method", it is how 90% (?) of research in economics gets done (almost everything except for experimental economics)... which makes me strongly doubt the validity of any and all of their inference... Looking at your link, I see that there the question and the answers address something like [P-consistency rather than T-consistency](http://stats.stackexchange.com/questions/265739/t-consistency-vs-p-consistency), which I am not interested in for this question here. So the link is interesting, but the questions are not duplicates. – Richard Hardy Mar 14 '17 at 10:25
  • 3
    @DeltaIV, that is, when doing inference, I am not interested in the *least false* parameters as under P-consistency, but rather I am interested in the *true* ones (the true partial derivative of $y$ w.r.t. $x$). – Richard Hardy Mar 14 '17 at 10:33
  • I never said that your question was a duplicate of mine, just that it might help, but evidently it doesn't. What about the other link, the pitch by Rob Tibshirani? On page 21, he explicitly mentions being able to control *selective type I error*, a sort of type I error when the hypothesis being tested is random (in the sense that it depends on the sample data). This seems to me more related to T-consistency than P-consistency (in your terminology), as no loss function is assumed in the definition of selective type I error. But, again, I am no expert in this kind of stuff and I may be wrong. – DeltaIV Mar 14 '17 at 10:37
  • ps just one last comment on the unreliability of research done the way you described. You probably know it already, but Gelman's [garden of forking paths](http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf) does come to mind. – DeltaIV Mar 14 '17 at 10:47
  • 1
    @DeltaIV, no, of course you did not say it was a duplicate. I said that as a preemptive strike :) But your links are really interesting and helpful! And I am well aware of Gelman, he really helped me understand statistical methodology better (although there is still a looong way to go!). – Richard Hardy Mar 14 '17 at 10:48
  • You write "This makes the estimator distribution (and thus also inference) conditional on the change in the underlying model, which is due to the observed data. Clearly, the introduction of such conditioning is not satisfactory.". Why the last sentence? – Alecos Papadopoulos Mar 14 '17 at 18:52
  • @AlecosPapadopoulos, These two sentences are sloppy as I still have not found a good way to express myself (evidently I do not understand the phenomenon perfectly). What I mean is that we are interested in inference as stated at the beginning, not some modification of it. – Richard Hardy Mar 14 '17 at 19:10
  • 1
    Leeb and Pötscher have studied this extensively. The distribution of the estimator is typically highly non-standard and you are perfectly right that inference is usually highly flawed because of this. This applies to any model selection procedure, be it AIC, OLS post lasso, pretesting etc. There was a paper in the Annals of Statistics in 2013 by Berk etc although which supposedly allows for valid inference. If you want to search further, just google post-selection inference. Hjort and Claeskens 2003 model averaging paper in JASA is a good read too. – hejseb Mar 15 '17 at 18:07
  • 1
    @hejseb, thanks! This issue is incredibly disturbing when I think about causal inference in economics (except, I guess, for experimental economics)... Looks like we could scrape most of the published results. Painful... – Richard Hardy Mar 15 '17 at 18:15
  • 3
    @RichardHardy, sure, despite being a stats grad student I don't really believe in inference anymore. It's a house of cards so fragile that it's unclear whether it's meaningful at all except in very strict and controlled circumstances. What is funny is that everyone knows this, but no one (well) cares. – hejseb Mar 15 '17 at 18:46
  • @DJohnson Relevant lit differentiating **prediction and explanation**:$$\phantom{0}$$ Prediction, explanation and the epistemology of future studies. *Futures*, 35(10):1027–1040. $$\phantom{0}$$ Hanson, N. R. (1959). On the symmetry between explanation and prediction. The *Philosophical Review*, 68(3):349–358.$$\phantom{0}$$ Rescher, N. (1958). On prediction and explanation. *The British Journal for the Philosophy of Science*, 8(32):281–290.$$\phantom{0}$$ Scheffler, I. (1957). Explanation, prediction, and abstraction. *The British Journal for the Philosophy of Science*, 7(28):293–309. – Alexis Mar 15 '17 at 19:05
  • @hejseb, hmm, probably not everyone knows this. People around you may be statistically literate, but there is some selection bias. – Richard Hardy Mar 15 '17 at 19:42
  • For what it's worth, this *AER* article *Prediction Policy Problems* by Jon Kleinberg, Jens Ludwig, Sendhil Mullainathan, and Ziad Obermeyer specifically addresses concerns with inference vs prediction, "We argue that there are also many policy applications where causal inference is not central, or even necessary." (https://www.cs.cornell.edu/home/kleinber/aer15-prediction.pdf) – Mike Hunter Mar 15 '17 at 22:04

2 Answers2

3

The way out is literally out of sample test, a true one. Not the one where you split sample into training and hold out like in crossvalidation, but the true prediction. This works very well in natural sciences. In fact it's the only way it works. You build a theory on some data, then you're expected to come up with a prediction of something that was not observed yet. Obviously, this doesn't work in most social (so called) sciences such as economics.

In the industry this works as in sciences. For instance, if the trading algorithm doesn't work, you're going to lose money, eventually, and then you abandon it. Cross validation and training data sets are used extensively in development and making a decision to deploy the algorithm, but after it's in production it's all about making money or losing. Very simple out of sample test.

Aksakal
  • 55,939
  • 5
  • 90
  • 176
  • Does that help estimate $\frac{\partial y}{\partial x}$? – Richard Hardy Mar 15 '17 at 19:32
  • @RichardHardy, yes, you test the same hypothesis on the new data. If it holds then you're good. If your model is misspecifyied then it should eventually fail, I mean other diagnostics too. You should see that the model is not working with new data. – Aksakal Mar 15 '17 at 20:24
  • OK, then it sounds like the good old prescription of splitting the sample into a subsample for model building and another for hypothesis testing. I should have included that consideration already in the OP. In any case, that seems like a sound strategy. The problem with macroeconomics, for example, would be that the same model would almost never fit unseen data well (as the data generating process is changing over time), so the exact same problem that we begin with would persist. But that is an example where basically any method fails, so it is not a fair criticism. – Richard Hardy Mar 15 '17 at 20:29
  • Meanwhile, in microeconomics in cross-sectional data setting it could work. +1 for now. On the other hand, once a model has been fit to all available data, this solution will not work. I guess that is what I was thinking when I wrote the question, and I am looking for answers that address the title question: inference from misspecified model. – Richard Hardy Mar 15 '17 at 20:33
  • @RichardHardy, no splitting the sample into pieces is not the same, because then you end up using out of sample in a very similar way to in-sample. I'm saying that you do best you could do with what you have, then make a prediction out of sample and collect new data. That's the only way – Aksakal Mar 15 '17 at 21:08
  • I did not mean the kind of splitting you mention. It is easy to realize that you can split the sample in exactly the same way as if there was "old" and "new" data. That is probably the only *real* way out as a macroeconomist cannot wait another 5 years to collect another 20 quarterly data points if a decision has to be made at the current moment. Also, in other branches of economics it may rarely be realistic that a researcher will go out and gather more data. Thus sample splitting into "old" and "new" data achieves the goal, without the extra complications of gathering new data. – Richard Hardy Mar 16 '17 at 06:05
  • @RichardHardy, in social sciences sample splitting is the way we go, because we can't do anything else, but it's not the way out. That's why economics is stuck where it is now without a single solid theory, a *soft* science. For instance, there is not a single fundamental constant, such as electron charge in physics, in the whole field of economics. In the industry (or applied economics) it's a little better because we can often collect new data, such as sales or income of other stores. – Aksakal Mar 16 '17 at 13:08
  • 2
    I sympathize with your view. But since sample splitting into "old" and "new" is equivalent to collecting new data, I do not understand where you see a big difference between the two. – Richard Hardy Mar 16 '17 at 13:11
1

You could define a "combined procedure" and investigate its characteristics. Let's say you start from a simple model and allow for one, two or three more complex (or nonparametric) models to be fitted in case that the simple model doesn't fit. You need to specify a formal rule according to which you decide not to fit the simple model but one of the others (and which one). You also need to have tests for your hypothesis of interest to be applied under all the involved models (parametric or nonparametric).

With such a setup you can simulate the characteristics, i.e., with what percentage your null hypothesis is finally rejected in case it is true, and in case of several deviations of interest. Also you can simulate from all involved models, and look at things such as conditional level and conditional power given that data came from model X, Y, or Z, or given that the model misspecification test procedure selected model X, Y, or Z.

You may find that model selection doesn't do much harm in the sense that the achieved level is still very close to the level you were after, and the power is OK if not excellent. Or you may find that data-dependent model selection really screws things up; it'll depend on the details (if your model selection procedure is very reliable, chances are level and power won't be affected very strongly).

Now this isn't quite the same as specifying one model and then looking at the data and deciding "oh, I need another", but it's probably as close as you can get to investigating what would be the characteristics of such an approach. It's not trivial because you need to make a number of choices to get this going.

General remark: I think it is misleading to classify applied statistical methodology binarily into "valid" and "invalid". Nothing is ever 100% valid because model assumptions never hold precisely in practice. On the other hand, although you may find valid (!) reasons for calling something "invalid", if one investigates the characteristics of the supposedly invalid approach in depth, one may find out that it still works fairly well.

Christian Hennig
  • 10,796
  • 8
  • 35
  • I wonder if this is realistic in practice aside from the simplest of problems. Computational cost of simulations would quickly exceed our capabilities in most of the cases, don't you think so? Your comment on validity is of course logical. However, without this simple yet useful (in aiding our reasoning) notion we would be even more lost than we are with it - that is my perspective. – Richard Hardy Jun 07 '19 at 15:31
  • I'm not saying that this should e done every time such a situation is met in practice. It's rather a research project; however one take away message is that in my opinion, for the reasons given, data dependent model selection doesn't exactly invalidate inference that would have been valid otherwise. Such combined procedures may work rather well in many situations, although this is currently not properly investigated. – Christian Hennig Jun 07 '19 at 16:50
  • I guess if this was feasible, it would already be in use. The main problem might be infeasibility due to the large amount of modelling choices that are data dependent (back to my first comment). Or do you not see a problem there? – Richard Hardy Jun 07 '19 at 17:06
  • There's the odd simulation in the literature exploring misspecification test/model selection first and then parametric inference conditional on the outcome of that. Results are mixed as far as I know. A "classical" example is here: https://www.tandfonline.com/doi/abs/10.1080/00949657808810243?journalCode=gscs20 – Christian Hennig Jun 08 '19 at 22:14
  • But you're right; modelling the full process with all kinds of possible modelling options would require lots of choices. I still think it'd be a worthwhile project, although not something that one could demand whenever models are selected from the same data to which they're fitted. Aris Spanos by the way argues against the idea that misspecification testing or model check on the data makes inference invalid. https://onlinelibrary.wiley.com/doi/abs/10.1111/joes.12200 – Christian Hennig Jun 08 '19 at 22:18
  • To which [I respond](https://stats.stackexchange.com/questions/303887/effects-of-model-selection-and-misspecification-testing-on-inference-probabilis): I do not understand the argument. A commenter adds: *this stuff reads mostly like expounding philosophical principles and then jumping to conclusions (for which there is no empirical or mathematical proof) by analogies and hand-waving*. – Richard Hardy Jun 09 '19 at 05:23
  • I agree that Spanos denies that there is a problem where both of us think there is one. I think something like what I proposed would be needed to investigate how big the problem actually is. Apart from his general argument, Spanos makes a few remarks why the effect of model misspecification can often be expected to be low and is non-existing in some situations. Actually you may think it is somewhat schizophrenic that Spanos argues both a) that model misspecification isn't a problem for conceptual/philosophical reasons and b) that it isn't a big problem, conceptual/philosophical reasons aside. – Christian Hennig Jun 10 '19 at 10:29
  • I find his writings amusing and have a lot of respect for him, but I really do have a problem understanding the point we are discussing now. Personal communication has not helped all that much as his response to my request used roughly the same arguments and similar wording to the cited texts. I think first of all, we are just having some communication problems. – Richard Hardy Jun 10 '19 at 10:40