Click here to hide/show the list of notebooks.
  pyAgrum on notebooks   pyAgrum jupyter
☰  Causality_SimpsonParadox 
pyAgrum 0.16.2   
Zipped notebooks   
generation: 2019-10-02 10:58  

Creative Commons License
This pyAgrum's notebook is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

In [1]:
from IPython.display import display, Math, Latex

import pyAgrum as gum
import pyAgrum.lib.notebook as gnb
import pyAgrum.causal as csl
import pyAgrum.causal.notebook as cslnb

Simpson's Paradox

This notebook follows the famous example from Causality (Pearl, 2009).

In a statistical study about a drug, we try to evaluate the latter's efficiency among a population of men and women. Let's note:

  • $Drug$ : drug taking
  • $Patient$ : cured patient
  • $Gender$ : patient's gender

The model from the observed date is as follow :

In [2]:
m1 = gum.fastBN("Gender{F|M}->Drug{Without|With}->Patient{Sick|Healed}<-Gender")

m1.cpt("Gender")[:]=[0.5,0.5]
m1.cpt("Drug")[:]=[[0.25,0.75],  #Gender=F
                   [0.75,0.25]]  #Gender=M

m1.cpt("Patient")[{'Drug':'Without','Gender':'F'}]=[0.2,0.8] #No Drug, Male -> healed in 0.8 of cases
m1.cpt("Patient")[{'Drug':'Without','Gender':'M'}]=[0.6,0.4] #No Drug, Female -> healed in 0.4 of cases
m1.cpt("Patient")[{'Drug':'With','Gender':'F'}]=[0.3,0.7] #Drug, Male -> healed 0.7 of cases 
m1.cpt("Patient")[{'Drug':'With','Gender':'M'}]=[0.8,0.2] #Drug, Female -> healed in 0.2 of cases
gnb.sideBySide(m1,m1.cpt("Gender"),m1.cpt("Drug"),m1.cpt("Patient"))
G Gender Gender Drug Drug Gender->Drug Patient Patient Gender->Patient Drug->Patient
Gender
F
M
0.50000.5000
Drug
Gender
Without
With
F
0.25000.7500
M
0.75000.2500
Patient
Gender
Drug
Sick
Healed
F
Without
0.20000.8000
With
0.30000.7000
M
Without
0.60000.4000
With
0.80000.2000
In [3]:
def getCuredObservedProba(m1,evs):
    evs0=dict(evs)
    evs1=dict(evs)
    evs0["Drug"]='Without'
    evs1["Drug"]='With'
    
    return gum.Potential().add(m1.variableFromName("Drug")).fillWith([
            gum.getPosterior(m1,target="Patient",evs=evs0)[1],
            gum.getPosterior(m1,target="Patient",evs=evs1)[1]
        ])

gnb.sideBySide(getCuredObservedProba(m1,{}),
               getCuredObservedProba(m1,{'Gender':'F'}),
               getCuredObservedProba(m1,{'Gender':'M'}),
               captions=["$P(Patient = Healed \mid Drug )$<br/>Taking $Drug$ is observed as efficient to cure",
                         "$P(Patient = Healed \mid Gender=F,Drug)$<br/>except if the $gender$ of the patient is female",
                         "$P(Patient = Healed \mid Gender=M,Drug)$<br/>... or male."])
Drug
Without
With
0.50000.5750
Drug
Without
With
0.80000.7000
Drug
Without
With
0.40000.2000
$P(Patient = Healed \mid Drug )$
Taking $Drug$ is observed as efficient to cure
$P(Patient = Healed \mid Gender=F,Drug)$
except if the $gender$ of the patient is female
$P(Patient = Healed \mid Gender=M,Drug)$
... or male.

Those results form a paradox called Simpson paradox :

$$P(C\mid \neg{D}) = 0.5 < P(C\mid D) = 0.575$$$$P(C\mid \neg{D},G = Male) = 0.8 > P(C\mid D,G = Male) = 0.7$$$$P(C\mid \neg{D},G = Female) = 0.4 > P(C\mid D,G = Female) = 0.2$$

Actuallay, giving a drug is not an observation in our model but rather an intervention. What if we use intervention instead of observation ?

How to compute causal impacts on the patient's health ?

Computing $P (Patient = Healed \mid \hookrightarrow Drug = Without)$

In [4]:
d1 = csl.CausalModel(m1)
cslnb.showCausalImpact(d1, "Patient", doing="Drug",values={"Drug" : "Without"})
G Gender Gender Drug Drug Gender->Drug Patient Patient Gender->Patient Drug->Patient
$$\begin{equation}P( Patient \mid \hookrightarrow\mkern-6.5muDrug) = \sum_{Gender}{P\left(Patient\mid Drug,Gender\right) \cdot P\left(Gender\right)}\end{equation}$$
Patient
Sick
Healed
0.40000.6000
Causal Model
Explanation : backdoor ['Gender'] found.
Impact : $P( Patient \mid \hookrightarrow\mkern-6.5muDrug=Without)$

We have, $$P (Patient = Healed \mid \hookrightarrow Drug = without) = 0.6$$

Computing $P (Patient = Healed \mid \hookrightarrow Drug = With )$

In [5]:
d1 = csl.CausalModel(m1)
cslnb.showCausalImpact(d1, "Patient", "Drug",values={"Drug" : "With"})
G Gender Gender Drug Drug Gender->Drug Patient Patient Gender->Patient Drug->Patient
$$\begin{equation}P( Patient \mid \hookrightarrow\mkern-6.5muDrug) = \sum_{Gender}{P\left(Patient\mid Drug,Gender\right) \cdot P\left(Gender\right)}\end{equation}$$
Patient
Sick
Healed
0.55000.4500
Causal Model
Explanation : backdoor ['Gender'] found.
Impact : $P( Patient \mid \hookrightarrow\mkern-6.5muDrug=With)$

And then : $$P(Patient = Healed \mid \hookrightarrow Drug = With ) = 0.45 $$

Therefore : $$P(Patient = Healed\mid \hookrightarrow Drug = Without) = 0.6 > P(Patient = Healed\mid \hookrightarrow Drug = With) = 0.45 $$

Which means that taking this drug would not enhance the patient's healing process, and it is better not to prescribe this drug for treatment.

Simpson paradox solved by interventions

So to summarize, the paradox appears when wrongly dealing with observations on $Drug$ :

In [6]:
gnb.sideBySide(getCuredObservedProba(m1,{}),
               getCuredObservedProba(m1,{'Gender':'F'}),
               getCuredObservedProba(m1,{'Gender':'M'}),
               captions=["$P(Patient = Healed \mid Drug )$<br/>Taking $Drug$ is observed as efficient to cure",
                         "$P(Patient = Healed \mid Gender=F,Drug)$<br/>except if the $gender$ of the patient is female",
                         "$P(Patient = Healed \mid Gender=M,Drug)$<br/>... or male."])
Drug
Without
With
0.50000.5750
Drug
Without
With
0.80000.7000
Drug
Without
With
0.40000.2000
$P(Patient = Healed \mid Drug )$
Taking $Drug$ is observed as efficient to cure
$P(Patient = Healed \mid Gender=F,Drug)$
except if the $gender$ of the patient is female
$P(Patient = Healed \mid Gender=M,Drug)$
... or male.

... and disappears when dealing with intervention on $Drug$ :

In [7]:
gnb.sideBySide(csl.causalImpact(d1,on="Patient",doing="Drug",values={"Patient":"Healed"})[1],
               csl.causalImpact(d1,on="Patient",doing="Drug",knowing={"Gender"},values={"Patient":"Healed","Gender":"F"})[1],
               csl.causalImpact(d1,on="Patient",doing="Drug",knowing={"Gender"},values={"Patient":"Healed","Gender":"M"})[1],
               captions=["$P(Patient = 1 \mid \hookrightarrow Drug )$<br/>Effectively $Drug$ taking is not efficient to cure",
                         "$P(Patient = 1 \mid \hookrightarrow Drug, gender=F )$<br/>, the $gender$ of the patient being female",
                         "$P(Patient = 1 \mid \hookrightarrow Drug, gender=M )$<br/>, ... or male."])
Drug
Without
With
0.60000.4500
Drug
Without
With
0.80000.7000
Drug
Without
With
0.40000.2000
$P(Patient = 1 \mid \hookrightarrow Drug )$
Effectively $Drug$ taking is not efficient to cure
$P(Patient = 1 \mid \hookrightarrow Drug, gender=F )$
, the $gender$ of the patient being female
$P(Patient = 1 \mid \hookrightarrow Drug, gender=M )$
, ... or male.
In [ ]: