Creative Commons License
This pyAgrum's notebook is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Author: Aymen Merrouche and Pierre-Henri Wuillemin.

**Back-Door Criterion**

This notebook follows the examples from "The Book Of Why" (Pearl, 2018) chapter 4 page 150.

Back-Door Criterion

In [1]:
from IPython.display import display, Math, Latex,HTML

import pyAgrum as gum
import pyAgrum.lib.notebook as gnb
import pyAgrum.causal as csl
import pyAgrum.causal.notebook as cslnb
import os

In a causal diagram, confounding bias is due to the flow of non-causal information between treatment $X$ and outcome $Y$ through back-door paths. To neutralize this bias, we need to block these paths.
To block a non-causal path, we must perform an adjustment operation for a variable or a set of variables that would block the flow of information on that path. Such a set of variables satisfies what we call the "back-door" criterion. A set of variables $Z$ satisfies the back-door criterion for $(X, Y)$ if and only if:

  • $Z$ blocks all back-door paths between $X$ and $Y$. A "back-door path" is any path in the causal diagram between $X$ and $Y$ starting with an arrow pointing towards $X$.
  • No variable in $Z$ is a descendant of $X$ on a causal path, if we adjust for such a variable we would block a path that carries causal information hence the causal effect of $X$ on $Y$ would be biased.

If a set of $ Z $ variable satisfies the back-door criterion for $(X,Y)$, the causal effect of $X$ on $Y$ is given by the formula: $$P(y \mid do(x)) = \sum_{z}{P(y \mid x,z) \times P(z)}$$

Example 1:

In [2]:
e1 = gum.fastBN("X->A->Y;A->B")
e1
Out[2]:
G X X A A X->A Y Y A->Y B B A->B
In [3]:
m1 = csl.CausalModel(e1)
cslnb.showCausalImpact(m1, "Y", doing="X",values={})
G X X A A X->A Y Y A->Y B B A->B
$$\begin{equation}P( Y \mid \hookrightarrow\mkern-6.5muX) = \sum_{A}{P\left(A\mid X\right) \cdot \left(\sum_{X'}{P\left(Y\mid A\right) \cdot P\left(X'\right)}\right)}\end{equation}$$
Y
X
0
1
0
0.32600.6740
1
0.28160.7184
Causal Model
Explanation : frontdoor ['A'] found.
Impact : $P( Y \mid \hookrightarrow\mkern-6.5muX)$
In [4]:
# This function returns the set of variables which satisfies the back-door criterion for (X, Y) 
# None if there are no back-door paths.
setOfVars = m1.backDoor("X","Y")
print("The set of variables which satisfies the back-door criterion for (X, Y) is :", setOfVars)
The set of variables which satisfies the back-door criterion for (X, Y) is : None

No incoming arrows into X, therefore there are no back-door paths between $X$ and $Y$ (as if we did a graph surgery according to the do operator), direct causal path $X \rightarrow A \rightarrow Y$.

Example 2:

In [5]:
e2 = gum.fastBN("A->B->C;A->X->E->Y;B<-D->E")
e2
Out[5]:
G A A B B A->B X X A->X C C B->C E E X->E Y Y E->Y D D D->B D->E
In [6]:
m2 = csl.CausalModel(e2)
cslnb.showCausalImpact(m2, "Y", doing="X",values={})
G A A B B A->B X X A->X C C B->C E E X->E Y Y E->Y D D D->B D->E
$$\begin{equation}P( Y \mid \hookrightarrow\mkern-6.5muX) = \sum_{E}{P\left(E\mid X\right) \cdot \left(\sum_{X'}{P\left(Y\mid E\right) \cdot P\left(X'\right)}\right)}\end{equation}$$
Y
X
0
1
0
0.42520.5748
1
0.63490.3651
Causal Model
Explanation : frontdoor ['E'] found.
Impact : $P( Y \mid \hookrightarrow\mkern-6.5muX)$
In [7]:
# This function returns the set of variables which satisfies the back-door criterion for (X, Y) 
# None if there are no back-door paths.
setOfVars = m2.backDoor("X","Y")
print("The set of variables which satisfies the back-door criterion for (X, Y) is :", setOfVars)
The set of variables which satisfies the back-door criterion for (X, Y) is : None

There is one back-door path from $X$ to $Y$ : $$X \leftarrow A \rightarrow B \leftarrow D \rightarrow E \rightarrow Y$$ We don't need to control for any set of variables; this back-door path is blocked by collider node $B$ (two incoming arrows) $$A \rightarrow B \leftarrow D$$ Controlling for collider node $B$ would open this causal path (controlling for colliders increases bias), direct causal path $X \rightarrow E \rightarrow Y$.

Example 3:

In [8]:
e3 = gum.fastBN("B->X->Y;X->A<-B->Y")
e3
Out[8]:
G B B X X B->X Y Y B->Y A A B->A X->Y X->A
In [9]:
m3 = csl.CausalModel(e3)
cslnb.showCausalImpact(m3, "Y", doing="X",values={})
G B B X X B->X Y Y B->Y A A B->A X->Y X->A
$$\begin{equation}P( Y \mid \hookrightarrow\mkern-6.5muX) = \sum_{B}{P\left(Y\mid B,X\right) \cdot P\left(B\right)}\end{equation}$$
Y
X
0
1
0
0.45000.5500
1
0.74370.2563
Causal Model
Explanation : backdoor ['B'] found.
Impact : $P( Y \mid \hookrightarrow\mkern-6.5muX)$
In [10]:
# This function returns the set of variables which satisfies the back-door criterion for (X, Y) 
# None if there are no back-door paths.
setOfVars = m3.backDoor("X","Y")
print("The set of variables which satisfies the back-door criterion for (X, Y) is :", setOfVars)
The set of variables which satisfies the back-door criterion for (X, Y) is : ['B']

There is one back-door path from $X$ to $Y$ : $$Y \leftarrow B \rightarrow X $$ We need to block it by controlling for $B$ wich satisfies the back-door criterion.

Example 4 (M-bias):

In [11]:
e4 = gum.fastBN("X<-A->B<-C->Y")
e4
Out[11]:
G X X A A A->X B B A->B C C C->B Y Y C->Y
In [12]:
m4 = csl.CausalModel(e4)
cslnb.showCausalImpact(m4, "Y", doing="X",values={})
G X X A A A->X B B A->B C C C->B Y Y C->Y
$$\begin{equation}P( Y \mid \hookrightarrow\mkern-6.5muX) = P\left(Y\right)\end{equation}$$
Y
0
1
0.51910.4809
Causal Model
Explanation : No causal effect of X on Y, because they are d-separated (conditioning on the observed variables if any).
Impact : $P( Y \mid \hookrightarrow\mkern-6.5muX)$
In [13]:
# This function returns the set of variables which satisfies the back-door criterion for (X, Y) 
# None if there are no back-door paths.
setOfVars = m4.backDoor("X","Y")
print("The set of variables which satisfies the back-door criterion for (X, Y) is :", setOfVars)
The set of variables which satisfies the back-door criterion for (X, Y) is : None

There is one back-door path from $X$ to $Y$ : $$X \leftarrow A \rightarrow B \leftarrow C \rightarrow Y$$ We don't need to control for any set of variables, this back-door path is blocked by collider node $B$, the two variables are d-separated, deconfounded, independent. Controlling for collider node $B$ would make them dependant (introducing the M-bias).

Example 5:

In [14]:
e5 = gum.fastBN("X<-B<-A->X->Y<-C->B")
e5
Out[14]:
G X X Y Y X->Y B B B->X A A A->X A->B C C C->B C->Y
In [15]:
m5 = csl.CausalModel(e5)
cslnb.showCausalImpact(m5, "Y", doing="X",values={})
G X X Y Y X->Y B B B->X A A A->X A->B C C C->B C->Y
$$\begin{equation}P( Y \mid \hookrightarrow\mkern-6.5muX) = \sum_{C}{P\left(Y\mid C,X\right) \cdot P\left(C\right)}\end{equation}$$
Y
X
0
1
0
0.46800.5320
1
0.81410.1859
Causal Model
Explanation : backdoor ['C'] found.
Impact : $P( Y \mid \hookrightarrow\mkern-6.5muX)$

title

In [16]:
# This function returns the set of variables which satisfies the back-door criterion for (X, Y) 
# None if there are no back-door paths.
setOfVars = m5.backDoor("X","Y")
print("The set of variables which satisfies the back-door criterion for (X, Y) is :", setOfVars)
The set of variables which satisfies the back-door criterion for (X, Y) is : ['C']

The difference between this example and the previous one is that we added an arrow between $B$ and $X$ ( $B \rightarrow X$ ), this opens a new back-door path between $X$ and $Y$ that isn't blocked by any colliders $$X \leftarrow B \leftarrow C \rightarrow Y$$ We need to block the non-causal information that flows through it, controlling for $B$ closes this backdoor path (it prevents information from getting from $X$ to $C$). However, this action will open the back-door path that was formerly blocked by collider node $B$ that we are adjusting for now: $$X \leftarrow A \rightarrow B \leftarrow C \rightarrow Y$$ And, in this case, in addition to $B$ we would also control for $C$ or for $A$ to reblock the path we opened and to block the new path.

Another solution is to control for $C$ (it prevents information from getting from $B$ to $Y$) which satisfies the back-door criterion, it blocks the new path without reopening the one that is blocked by $B$.

Example 6:

In [17]:
e6 = gum.fastBN("A->X;A->B;D->A;B->X;C->B;C->E;C->Y;D->C;E->Y;E->X;F->C;F->X;F->Y;G->X;G->Y;X->Y")
e6
Out[17]:
G A A X X A->X B B A->B Y Y X->Y B->X D D D->A C C D->C C->B E E C->E C->Y E->X E->Y F F F->X F->C F->Y G G G->X G->Y
In [18]:
m6 = csl.CausalModel(e6)
cslnb.showCausalImpact(m6, "Y", doing="X",values={})
G A A X X A->X B B A->B Y Y X->Y B->X D D D->A C C D->C C->B E E C->E C->Y E->X E->Y F F F->X F->C F->Y G G G->X G->Y
$$\begin{equation}P( Y \mid \hookrightarrow\mkern-6.5muX) = \sum_{C,E,F,G}{P\left(Y\mid C,E,F,G,X\right) \cdot P\left(C,E,F,G\right)}\end{equation}$$
Y
X
0
1
0
0.48060.5194
1
0.59640.4036
Causal Model
Explanation : backdoor ['C', 'E', 'F', 'G'] found.
Impact : $P( Y \mid \hookrightarrow\mkern-6.5muX)$
In [19]:
# This function returns the set of variables which satisfies the back-door criterion for (X, Y) 
# None if there are no back-door paths.
setOfVars = m6.backDoor("X","Y")
print("The set of variables which satisfies the back-door criterion for (X, Y) is :", setOfVars)
The set of variables which satisfies the back-door criterion for (X, Y) is : ['C', 'E', 'F', 'G']

Back-door paths are:
1) - $X \leftarrow G \rightarrow Y$
2) - $X \leftarrow E \rightarrow Y$ and any other back-door paths that go through $E$
3) - $X \leftarrow F \rightarrow Y$ and any other back-door paths that go through $F$
4) - Blocked by collider $B$ : $X \leftarrow A \rightarrow B \leftarrow C \rightarrow Y$ and any other back-door paths that go through $ A$ will go through $C$
5) - $X \leftarrow B \leftarrow C \rightarrow Y$ and any other back-door paths that go through $B$ will go through $C$

Two sets of variables that satisfy the back-door criterion are:

  • {$C$,$E$,$F$,$G$} blocking (1), (2), (3) and (5)
  • {$A$,$B$,$E$,$F$,$G$} blocking (1), (2), (3), (5), opening (4) and reblocking it.
In [ ]: