How is causation defined mathematically?

Question

What is the mathematical definition of a causal relationship between two random variables?

Given a sample from the joint distribution of two random variables $X$ and $Y$, when would we say $X$ causes $Y$?

For context, I am reading this paper about causal discovery.

As far as I can see causality is a scientific not mathematical concept. Can you edit to clarify? — mdewey, Dec 08 '18 at 14:53
@mdewey I disagree. Causality can be cashed out in entirely formal terms. See e.g. my answer. — Kodiologist, Dec 08 '18 at 14:55

Carlos Cinelli · Accepted Answer · 2018-12-09T00:12:41.327

What is the mathematical definition of a causal relationship between two random variables?

Mathematically, a causal model consists of functional relationships between variables. For instance, consider the system of structural equations below:

$$ x = f_x(\epsilon_{x})\\ y = f_y(x, \epsilon_{y}) $$

This means that $x$ functionally determines the value of $y$ (if you intervene on $x$ this changes the values of $y$) but not the other way around. Graphically, this is usually represented by $x \rightarrow y$, which means that $x$ enters the structural equation of y. As an addendum, you can also express a causal model in terms of joint distributions of counterfactual variables, which is mathematically equivalent to functional models.

Given a sample from the joint distribution of two random variables X and Y, when would we say X causes Y?

Sometimes (or most of the times) you do not have knowledge about the shape of the structural equations $f_{x}$, $f_y$, nor even whether $x\rightarrow y$ or $y \rightarrow x$. The only information you have is the joint probability distribution $p(y,x)$ (or samples from this distribution).

This leads to your question: when can I recover the direction of causality just from the data? Or, more precisely, when can I recover whether $x$ enters the structural equation of $y$ or vice-versa, just from the data?

Of course, without any fundamentally untestable assumptions about the causal model, this is impossible. The problem is that several different causal models can entail the same joint probability distribution of observed variables. The most common example is a causal linear system with gaussian noise.

But under some causal assumptions, this might be possible---and this is what the causal discovery literature works on. If you have no prior exposure to this topic, you might want to start from Elements of Causal Inference by Peters, Janzing and Scholkopf, as well as chapter 2 from Causality by Judea Pearl. We have a topic here on CV for references on causal discovery, but we don't have that many references listed there yet.

Therefore, there isn't just one answer to your question, since it depends on the assumptions one makes. The paper you mention cites some examples, such as assuming a linear model with non-gaussian noise. This case is known as LINGAN (short for linear non-gaussian acyclic model), here is an example in R:

library(pcalg)
set.seed(1234)
n <- 500
eps1 <- sign(rnorm(n)) * sqrt(abs(rnorm(n)))
eps2 <- runif(n) - 0.5
x2 <- 3 + eps2
x1 <- 0.9*x2 + 7 + eps1

# runs lingam
X <- cbind(x1, x2)
res <- lingam(X)
as(res, "amat") 

# Adjacency Matrix 'amat' (2 x 2) of type ‘pag’:
#     [,1]  [,2]
# [1,] .     .   
# [2,]  TRUE .

Notice here we have a linear causal model with non-gaussian noise where $x_2$ causes $x_1$ and lingam correctly recovers the causal direction. However, notice this depends critically on the LINGAM assumptions.

For the case of the paper you cite, they make this specific assumption (see their "postulate"):

If $x\rightarrow y$ , the minimal description length of the mechanism mapping X to Y is independent of the value of X, whereas the minimal description length of the mechanism mapping Y to X is dependent on the value of Y.

Note this is an assumption. This is what we would call their "identification condition". Essentially, the postulate imposes restrictions on the joint distribution $p(x,y)$. That is, the postulate says that if $x \rightarrow y$ certain restrictions holds in the data, and if $y \rightarrow x$ other restrictions hold. These types of restrictions that have testable implications (impose constraints on $p(y,x)$) is what allows one to recover directionally from observational data.

As a final remark, causal discovery results are still very limited, and depend on strong assumptions, be careful when applying these on real world context.

Is there a chance you augment your answer to somehow include some simple examples *with fake data* please? For example, having read a bit of Elements of Causal Inference and viewed some of Peters' lectures, and a regression framework is commonly used to motivate the need for understanding the problem in detail (I am not even touching on their ICP work). I have the (maybe mistaken) impression that in your effort to move away from the RCM, your answers leave out all the actual tangible modelling machinery. — usεr11852, Dec 08 '18 at 23:09
@usεr11852 I'm not sure I understand the context of your questions, do you want examples of causal discovery? There are several examples in the very paper Jane has provided. Also, I'm not sure I understand what you mean by "avoiding RCM and leaving out actual tangible modeling machinery", what tangible machinery are we missing in the causal discovery context here? — Carlos Cinelli, Dec 08 '18 at 23:16
Apologies for the confusion, I do not care about examples from papers. I can cite other papers myself. (For example, Lopez-Paz et al. CVPR 2017 about their neural causation coefficient) What I care is for a simple numerical example with *fake data* that someone run in R (or your favourite language) and see what you mean. If you cite for example Peters' et al. book and they have small code snippets that hugely helpful (and occasionally use just `lm`) . We cannot all work around the Tuebingen datasets observational samples to get an idea of causal discovery! :) — usεr11852, Dec 08 '18 at 23:30
@usεr11852 sure, including a fake example is trivial, I can include one using lingam in R. But would you care to explain what you meant by "avoiding RCM and leaving out actual tangible modeling machinery"? — Carlos Cinelli, Dec 08 '18 at 23:50
I mean that you do not show any actual numbers or code. As much as I like your answers I cannot see cold hard numbers or code that I can go: "aha, he does `X`, `Y`, `Z` and gets answer `A` so in my work I have this problem and could also do `X`, `Y`, `Z` and get a great answer `A` too! Cool will try this!". Very few people saw a causal diagram and went: "*Yeah, I totally see myself coding this during company time!*" without some end result. Anyway, I saw your addition. Thank you for it. (+1 obviously) — usεr11852, Dec 08 '18 at 23:58
Yes, I fully agree. I try to my best to educate myself reasonably on this but when it comes to the particular real-world applications I am very defensive about it. — usεr11852, Dec 09 '18 at 00:11

Kodiologist · Answer 2 · 2018-12-11T19:22:34.363

6

There are a variety of approaches to formalizing causality (which is in keeping with substantial philosophical disagreement about causality that has been around for centuries). A popular one is in terms of potential outcomes. The potential-outcomes approach, called the Rubin causal model, supposes that for each causal state of affairs, there's a different random variable. So, $Y_1$ might be the random variable of possible outcomes from a clinical trial if a subject takes the study drug, and $Y_2$ might be the random variable if he takes the placebo. The causal effect is the difference between $Y_1$ and $Y_2$. If in fact $Y_1 = Y_2$, we could say that the treatment has no effect. Otherwise, we could say that the treatment condition causes the outcome.

Causal relationships between variables can also be represented with directional acylical graphs, which have a very different flavor but turn out to be mathematically equivalent to the Rubin model (Wasserman, 2004, section 17.8).

Wasserman, L. (2004). All of statistics: A concise course in statistical inference. New York, NY: Springer. ISBN 978-0-387-40272-7.

edited Dec 11 '18 at 19:22

answered Dec 08 '18 at 14:54

Kodiologist

19,063
2
36
68

thank you. what would be a test for it given a set of samples from joint distribution? – Jane Dec 08 '18 at 15:33
@Jane Applied causal inference is a field of study unto itself and not something I could tell you much about in a comment, even if I knew much about it beyond "do a randomized experiment to start with". – Kodiologist Dec 08 '18 at 15:48
3

I am reading https://arxiv.org/abs/1804.04622. I haven't read its references. I am trying to understand what one means by causality based on observational data. – Jane Dec 08 '18 at 16:30
What are these varieties of approaches you refer to? Also, you did not answer her question, which is about causal discovery (the “given a sample...” part). – Carlos Cinelli Dec 08 '18 at 21:12
@CarlosCinelli https://en.wikipedia.org/wiki/Causality#Theories has some more examples. Regarding your second point, I've edited my answer to point out that causation is when $Y_1 \neq Y_2$. – Kodiologist Dec 08 '18 at 21:26
1

I'm sorry (-1), this is not what is being asked, you don't observe $Y_1$ nor $Y_2$, you observe a sample of factual variables $X$, $Y$. See the paper Jane has linked. – Carlos Cinelli Dec 08 '18 at 21:29
@CarlosCinelli You can't observe both $Y_1$ and $Y_2$, but you can observe one of them, depending on which condition the subject was in. I'm not familiar with the cited paper and I've given a traditional account of causality rather than whatever Mitrovic et al. were thinking of (if they were thinking of something different). – Kodiologist Dec 08 '18 at 21:33
@Kodiologist and how does that help you deciding whether $X$ causes $Y$ or $Y$ causes $X$? And how do you decide it given the sample? It seems you incorrectly guessed this was a generic questions about mentioning "approaches to causality" (which would have been a duplicate anyway) and just gave a generic answer. – Carlos Cinelli Dec 08 '18 at 21:35
@Jane In the above text, you're given samples from two interventional distributions: $Y_i$, which is samples of $Y$ given treatment i. If you make further assumptions about the distribution of $Y_i$, you can work out whether $Y_1 \neq Y_2$ (e.g. ANOVA test). – Vimal Dec 08 '18 at 21:37
@CarlosCinelli It doesn't. Making decisions is an applied matter. The question asked what the mathematical definition of causation is. It is indeed a generic question. – Kodiologist Dec 08 '18 at 21:38
@Kodiologist did you even bother reading the paper? This is formally well defined mathematical problem. – Carlos Cinelli Dec 08 '18 at 21:39
@CarlosCinelli No, it isn't even mentioned in the question. – Kodiologist Dec 08 '18 at 21:40
@Vimal this is non-sense, you can't tell causal direction by performing an ANOVA. – Carlos Cinelli Dec 08 '18 at 21:40
2

@Vimal:I understand the case where we have "interventional distributions". We don't have "interventional distributions" in this setting and that is what makes it harder to understand. In the motivating example in the paper they give something like $(x, y=x^3+\epsilon)$. The conditional distribution of y given x is essentially the distribution of the noise $\epsilon$ plus some translation, while that doesn't hold for the conditional distribution of x given y. I initiatively understand the example. I am trying to understand what is the general definition for observational discovery of causality. – Jane Dec 08 '18 at 21:49
@CarlosCinelli Sorry I wasn't clear -- I am not suggesting ANOVA for causality. In the context of the answer above, $Y_i$ are the interventional distributions, so it applies. – Vimal Dec 08 '18 at 21:52
2

@Jane for observational case (for your question), in general you cannot infer direction of causality purely mathematically, at least for the two variable case. For more variables, under _additional (untestable) assumptions_ you _could_ make a claim, but the conclusion can still be questioned. This discussion is very long in comments. :) – Vimal Dec 08 '18 at 21:53
@Carlos you should give this a read: https://stats.stackexchange.com/conduct. I understand you disagree with the current answer, but there is no need to be rude. – guy Dec 08 '18 at 22:25
Are $Y_1$ and $Y_2$ random variables or distributions? – user158565 Dec 09 '18 at 03:56
@user158565 They're random variables. – Kodiologist Dec 09 '18 at 12:36
But "Y1 might be the distribution of ..." – user158565 Dec 09 '18 at 23:01
@user158565 yeah, sorry $Y_i$ are random variables whose distribution is often referred to as the "interventional distribution" (under intervention $i$). – Vimal Dec 09 '18 at 23:35
@user158565 That was a mistake. I've edited the text. – Kodiologist Dec 11 '18 at 19:23

score 0 · Answer 3 · answered Dec 12 '18 at 14:13

There are two ways to determine whether $X$ is the cause of $Y$. The first is standard while the second is my own claim.

There exists an intervention on $X$ such that the value of $Y$ is changed

An intervention is a surgical change to a variable that does not affect variables it depends on. Interventions have been formalized rigorously in structural equations and causal graphical models, but as far as I know, there is no definition which is independent of a particular model class.

The simulation of $Y$ requires the simulation of $X$

To make this rigorous requires formalizing a model over $X$ and $Y$, and in particular the semantics which define how it is simulated.

In modern approaches to causation, intervention is taken as the primitive object which defines causal relationships (definition 1). In my opinion, however, intervention is a reflection of, and necessarily consistent with simulation dynamics.

How is causation defined mathematically?

3 Answers3