non stochastic regressors

Question

In the multiple linear regression analysis if regressors are non-stochastic the causal interpretation of parameters is automatically permitted? I think so, because it seems me that the model can be interpreted as "true causal model" but I'm not sure. Expecially I'm not sure about the role of something like that control variables.

Dear Tim, I don't write "true deterministic model" but "true causal model". I mean "true model" as used in several econometrics textbook (as data generating process) and I intend it as explicitly causal. If I estimate a sample counterpart of it surely I achieve causal parameters. If I estimate under-specified model probably not. My doubt is about the link with experimental language when I have fixed design matrix in repeated sample. This strategy, at least in certain sense, is like to built a true causal model or not? — markowitz, Nov 10 '18 at 15:06
I ask this because I currently focused on causal interpretation in observational study but some author use fixed (non stochastic) regressors too. See for example here: https://stats.stackexchange.com/questions/374959/non-stochastic-regressors-and-causation?rq=1 — markowitz, Nov 10 '18 at 15:07
Shortly, as I write before, fixed in repeated sample. Actually in the texts that I read the meaning of "non-stochastic regressors" is vague. In any case the only meaning that seems me possible is something like "treatment variables" in experimental sense. — markowitz, Nov 11 '18 at 11:52
If this was true, then every experimental data would enable us to draw direct causual conclusions and this is simply not the case. — Tim, Nov 11 '18 at 12:11
It's not the case? May you explain to me more about that? I known that randomized controlled experiment is exactly the ideal framework for to grasp average causal effects. Surely It is possible to make a bad experiment but, shortly, random sampling (repeated) and random assignment in treatment and control group (fixed regressors, for example one dummy) it's not enough for causal interpretation of parameter(s)? — markowitz, Nov 11 '18 at 12:59

Carlos Cinelli · Answer 1 · 2018-11-14T01:00:28.133

2

In the multiple linear regression analysis if regressors are non-stochastic the causal interpretation of parameters is automatically permitted?

No, the fact that regressors are non-stochastic makes no difference for whether the estimates identify a causal effect. For causal interpretation of parameters you need a causal model.

For simplicity, assume everything has mean zero and unit variance. Let $X$ and $Z$ be two fixed vectors of size $n$ with $(n-1)^{-1}\sum_{i} X_{i}Z_{i} = \sigma_{xz} \neq 0$. Assume you do not observe $Z$.

Now let the structural equation for $Y$ be:

$$ Y = \gamma Z + U_{y} $$

Where $U_{y}$ is a zero mean, normally distributed random variable. Thus, there is no causal effect of $X$ on $Y$.

However, the regression of $X$ on $Y$ is given by,

$$ \begin{align} \hat{\beta} &= (n-1)^{-1}\sum_{i=1}^{n} X_{i}Y_{i} \\ &=(n-1)^{-1}\left(\gamma \sum_{i=1}^{n} X_{i}Z_{i} + \sum_{i=1}^{n} X_{i}U_{yi}\right)\\ &= \gamma\sigma_{xz} + (n-1)^{-1}\sum_{i=1}^{n} X_{i}U_{yi} \end{align} $$

Where the only "random" part is $U_{y}$, Thus, taking the expectation gives us:

$$ \begin{align} E[\hat{\beta}] &= \gamma\sigma_{xz} + (n-1)^{-1}\sum_{i=1}^{n} X_{i}E[U_{yi}]\\ &= \gamma\sigma_{xz} \end{align} $$

Which is different from zero and clearly does not have a causal meaning. We can also do asymptotics by letting the size of $X$ and $Z$ grow and keeping $\sigma_{xz}$ fixed. The very book you mention in the comments and in your other question, Brooks (2014), mentions how omitted variables can bias the estimate.

Bear in mind this is just an example of omitted variable bias, there are several other things that can go wrong, and they have nothing to do with whether you treat $X$ as stochastic or not. The bottom line here is that confounding, missing data, selection bias --- and many other problems---can still be present, whether you treat the regressors as random or "non-stochastic". You need to make assumptions about the presence or absence of those things, which are causal concepts, and for that you need a causal --- not a regression --- model.

edited Nov 14 '18 at 01:00

answered Nov 13 '18 at 07:28

Carlos Cinelli

10,500
5
42
77

Hi Carlos thanks you for your answer, but I disagree. Your demonstration is essentially the same of observational study. The text of my question is probably too short but I specify (see comment with Tim) that “In any case the only meaning that seems me possible [for fixed/non-stichastic regressors] is something like "treatment variables" in experimental sense”. Later … random sampling, in repeated sample scheme, and random assignment of the treatments are assumed as minimal conditions. Probably these are thing related with manipulations. – markowitz Nov 13 '18 at 16:43
. Shortly, for me, in fixed non-stochastic regressors scheme the dependence between two regressors, or among many, is not acceptable case. Ideally, among non random variables the correlation, or any kind of dependence, do not exist. Numerically is possible that correlation, or other dependence, exist but at least in large sample it must disappear. The same is true between regressor and errors (that remain random). Persistent correlation that you propose is not acceptable in my opinion. – markowitz Nov 13 '18 at 16:44
In any case researcher can always manipulate the treatment for achieve the substantially (numerical) independence among regressors (treatments). The independence among regressors seems me, at least, the base case in experiment. The same between regressors and errors. Moreover the omitted variable problem don’t play any role in the case that I have in mind. Researcher known, indeed builds, the “true” regressors; is a nonsense to use another set of variable as regressors. – markowitz Nov 13 '18 at 16:44
In others words fixed/non-stochastic regressors, at least for me, are something more than to see any kind of regressors and treated these as known (even in repeated sample). Similar strategy is substantially equivalent to observational study/regression. – markowitz Nov 13 '18 at 16:44
In other words again, conditioning on regressors or to use a non-stochastic regressors are a very different things for me. Essentially, the latter have experimental sense and the former have observational. If we think about them as essentially the same, and logically is possible, fixed / non-stochastic regressor became a useless concept. It seem me that only in this case you have right when write: “The bottom line here is that confounding, missing data, selection bias and so on will still be problems to consider whether you treat the regressors as random or "non-stochastic".” – markowitz Nov 13 '18 at 16:45
Finally in the case as Brooks (2014) is sustained the view like: correlation do not imply causation but regression imply causation. Now, the econometrics books can have a lot of problem about causation as Chen and Pearl (2014) pointed out. However in a scheme in which random and non-random regressor is substantially equivalent the Brooks mistake should be too big. – markowitz Nov 13 '18 at 16:45
Surely you known better then me the Pearl literature about Structural Causal Model (SCM). Moreover probably you known also the debate between “structuralist” and “experimentalist”. Currently SCM are not the standard in econometrics books if not in a simplest form of true, sometimes structural, models. At the other side “as good as an randomized controlled experiment” is frequent paradigm. – markowitz Nov 13 '18 at 16:45
My attempt is to bridge the two paradigm in simplest possible case. – markowitz Nov 13 '18 at 16:45
@markowitz treat $X$ and $Z$ as experimentally set, but you only observe $X$. You still have the same problem. The "fixed regressor" assumption in econometrics is just about treating the regressors as non-random, it has no bearing on getting causal conclusions. – Carlos Cinelli Nov 13 '18 at 16:55
1

@markowitz *among non random variables the correlation, or any kind of dependence, do not exist.* ---> this sentence is completely incorrect. Just imagine a deterministic process where I experimentally choose to set $X = (1, 2, 3, ...)$ and $Z = (1, 2,3 ...)$. The correlation of $X$ and $Z$ is one, and it will always be one. – Carlos Cinelli Nov 13 '18 at 17:08
@markowitz if your question were: if I run a perfectly randomized controlled experiment, can I derive causal conclusions? The answer would be, yes, some causal conclusions, depending on what other assumptions you make (about target population, functional forms, etc). But your question was about non-stochastic regressors, and if you *only* assume $X$ is not a random variable you provably can't make any causal conclusions---I can create as many counter examples as you want. – Carlos Cinelli Nov 13 '18 at 17:25
I thinked this example and we can generalize that $corr(X,Z)=1$ if $Z = a+bX$ but this is a case when one "variable" is function of another. I simply says that if we have a random verctor $X$ and a non random vertor $Z$ dependance between them, at least in general, is absent (repeated sample scheme). Or if both are non random the same, in general, hold. Naturally if $Z=f(X)$ where $f()$ is deterministic or not .. is another story. – markowitz Nov 13 '18 at 17:27
2

@markowitz this claim is clearly false, otherwise by your logic even the very relationship between $X$ and $Y$ would always need to be absent, since $X$ is not random and $Y$ is random. – Carlos Cinelli Nov 13 '18 at 17:31
@markowitz if you read even the very book you mention, Brooks (2014), he has a session on "omission of important variables", which says "The consequence would be that the estimated coefficients on all the other variables will be biased and inconsistent unless the excluded variable is uncorrelated with all the included variables. " – Carlos Cinelli Nov 14 '18 at 00:45
Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/85729/discussion-between-markowitz-and-carlos-cinelli). – markowitz Nov 14 '18 at 10:37

non stochastic regressors

1 Answers1

Linked