Why do we need identification in causal inference?

Question

I am reading Pearl's causality book and it states,

Identifiability ensures that the added assumptions conveyed by $M$ ... will supply the missing information without explicating $M$ in detail.

However, I still do not understand the requirement for causal identification and would appreciate some elaboration of the above statement.

Causal identification is equivalent to being able to estimate a causal quantity in terms of observed data. That passage "not explicating M in detail" means just using observed data. — user551504, Nov 19 '21 at 15:14
Try doing a deep dive on Lewbel's "The Identification Zoo: Meanings of Identification in Econometrics". — Zen, Nov 19 '21 at 18:35

score 1 · Accepted Answer · answered Nov 19 '21 at 19:51

1

In causal inference you can think about identifiability as the condition that permit to measure causal quantity from observed data. Among parametric models is the condition that permit to estimate causal parameters from regressional. Formally

$E[Y|do(X)] = E[Y|X] $

Can be consider as identifiability condition. Identifiability condition is one key point of any causal model. In general if you do not have enough assumptions and/or data identification is not possible.

For examples about it read here:

In Berkson's paradox, is $\beta_1 = 0$ or $\ne 0$?

Infer one link of a causal structure, from observations

answered Nov 19 '21 at 19:51

markowitz

3,964
1
13
28

But $E[Y|do(X)] = E[Y|X]$ is not necessary for identification, right? It is just the simplest example of it. – Richard Hardy Nov 20 '21 at 08:33
The identification condition can be generalized as $E[Y|do(X) ] =E[Y|X, Z] $ where $Z$ is the right set of controls. In this sense the above is a (notable) example. – markowitz Nov 20 '21 at 09:56
Yeah, that would give us instrumental variables, GMM and the like. – Richard Hardy Nov 20 '21 at 10:09

score 1 · Answer 2 · answered Dec 11 '21 at 01:42

Let's say you have a Treatment variable, an Outcome variable and numerous other variables. One could do a regression of one on the other, adjusting by everything else, but we're smarter than this, right? How do we know this measures the direct relationship between treatment and outcome? Maybe we're not adjusting for an important confounder. Maybe we're adjusting for a collider and worsening our estimate, instead of what we really want.

The causal identification step is important to see if it's possible to estimate the effect of Treatment on Outcome. And if it is, how we can do so (backdoor adjustment, frontdoor adjustment, and so on). Sometimes it is not identifiable, and there is nothing we can do :|. Once the identification step is done, you can estimate the causal effect.

score 1 · Answer 3 · answered Dec 11 '21 at 02:36

This is my understanding. Correct me if I am wrong.

Suppose I am conducting a randomized clinical trial to investigate the difference in means between two treatments, A and B. If I randomize everyone to treatment B with probability 1 then the population mean for treatment A and the difference in means are both unidentifiable. There is no data available to estimate these population quantities.

Suppose I randomize subjects to treatment B with probability 1/2 and during the course of the trial some subjects switch to other therapies. The causal treatment effect (difference in population means between treatment A and B) in the envisioned scenario where post baseline treatment switching does not occur is unidentifiable using the observed data.

In both examples I could use an unverifiable missing data assumption that does make the population treatment effect identifiable.

Why do we need identification in causal inference?

3 Answers3

Linked