Creative Commons License
This pyAgrum's notebook is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Author: Aymen Merrouche and Pierre-Henri Wuillemin.

**Walking Example**

This notebook follows the example from "The Book Of Why" (Pearl, 2018) chapter 4 page 135.

Confounding

In [1]:
from IPython.display import display, Math, Latex,HTML

import pyAgrum as gum
import pyAgrum.lib.notebook as gnb
import pyAgrum.causal as csl
import pyAgrum.causal.notebook as cslnb
import os

In 1998 a study unveiled a correlation between physical exercise and longevity among nonsmoking retired men. Of course what we want to know is whether men who exercise more live longer, suggesting a causal relationship. Study measurements are to be found at the end of this notebook.

We create the causal diagram:

The corresponding causal diagram is the following:

In [2]:
# We create the causal diagram
we = gum.fastBN("Walking{casual|normal|intense}->Mortality{dead|alive}")

# We fill the CPTs
we.cpt("Walking")[:]=[151/707,379/707,177/707]
we.cpt("Mortality")[{"Walking":"casual"}]=[0.43,0.57]
we.cpt("Mortality")[{"Walking":"intense"}]=[0.215,0.785]
we.cpt("Mortality")[{"Walking":"normal"}]=[0.277,0.723]
                  
gnb.sideBySide(we,we.cpt("Walking")*we.cpt("Mortality"),we.cpt("Walking"),we.cpt("Mortality"),
               captions=["the BN","the joint distribution","the marginal for $Walking$","the CPT for $Mortality$"])
G Walking Walking Mortality Mortality Walking->Mortality
Walking
Mortality
casual
normal
intense
dead
0.09180.14850.0538
alive
0.12170.38760.1965
Walking
casual
normal
intense
0.21360.53610.2504
Mortality
Walking
dead
alive
casual
0.43000.5700
normal
0.27700.7230
intense
0.21500.7850
the BN
the joint distribution
the marginal for $Walking$
the CPT for $Mortality$

The study showed that after 12 years, 43% of casual walkers died while only 21,5% of intense walkers died.

Causal effect of walking on mortality in this model:

In [3]:
weModele = csl.CausalModel(we)
cslnb.showCausalImpact(weModele,"Mortality",doing="Walking",values={})
G Walking Walking Mortality Mortality Walking->Mortality
$$\begin{equation}P( Mortality \mid \hookrightarrow\mkern-6.5muWalking) = P\left(Mortality\mid Walking\right)\end{equation}$$
Mortality
Walking
dead
alive
casual
0.43000.5700
normal
0.27700.7230
intense
0.21500.7850
Causal Model
Explanation : Do-calculus computations
Impact : $P( Mortality \mid \hookrightarrow\mkern-6.5muWalking)$

Before jumping to any conclusions, we should consider the presence of possible confounders. We need to ask the following question: what characterizes intense walkers from casual walkers?
Without abandoning the idea of a possible cause-and-effect relationship between walking and mortality, we introduce a third variable, a "confounder", a common cause of the two variables that could explain the correlation that exists between them. Our aim is to distinguish between the causal effect of walking on mortality (if there is a cause and effect relationship) the bias induced by this third variable. For this purpose, we need to adjust for it.

In [4]:
weModele1 = csl.CausalModel(we, [("confounder", ["Walking","Mortality"])], True)
cslnb.showCausalImpact(weModele1, "Mortality", "Walking",values={"Walking":"intense"})
G Walking Walking Mortality Mortality Walking->Mortality confounder confounder->Walking confounder->Mortality
?
No result
Causal Model
Explanation : Hedge Error: G={'Mortality', 'Walking'}, G[S]={'Mortality'}
Impact : $?$

Introducing age as a confounder:

We want to measure the causal effect of walking on mortality, the introduction of a confounding bias occurs when a third variable called "confounding variable" influences both walking and mortality.
An obvious confounder is age, younger subjects exercise more and have more time to live! (there are other confounders)

Let's use fictitious data:

In [5]:
wea = gum.fastBN("Age{cat1|cat2|cat3}->Walking{casual|normal|intense}->Mortality{dead|alive}<-Age{cat1|cat2|cat3}")
                 
gnb.sideBySide(wea,wea.cpt("Age"),wea.cpt("Walking"),wea.cpt("Mortality"),
               captions=["the BN","the marginal for $Age$","the CPT for $Walking$","the CPT for $Mortality$"])
G Age Age Walking Walking Age->Walking Mortality Mortality Age->Mortality Walking->Mortality
Age
cat1
cat2
cat3
0.28830.41850.2933
Walking
Age
casual
normal
intense
cat1
0.16650.69790.1355
cat2
0.03800.12650.8355
cat3
0.23120.74820.0206
Mortality
Age
Walking
dead
alive
cat1
casual
0.49650.5035
normal
0.33420.6658
intense
0.13780.8622
cat2
casual
0.23070.7693
normal
0.14320.8568
intense
0.17720.8228
cat3
casual
0.17600.8240
normal
0.91030.0897
intense
0.79400.2060
the BN
the marginal for $Age$
the CPT for $Walking$
the CPT for $Mortality$

Causal effect of walking on mortality with age as a confounder:

In [6]:
weModele2 = csl.CausalModel(wea)
cslnb.showCausalImpact(weModele2, "Mortality", "Walking",values={})
G Age Age Walking Walking Age->Walking Mortality Mortality Age->Mortality Walking->Mortality
$$\begin{equation}P( Mortality \mid \hookrightarrow\mkern-6.5muWalking) = \sum_{Age}{P\left(Mortality\mid Age,Walking\right) \cdot P\left(Age\right)}\end{equation}$$
Mortality
Walking
dead
alive
casual
0.29130.7087
normal
0.42320.5768
intense
0.34670.6533
Causal Model
Explanation : backdoor ['Age'] found.
Impact : $P( Mortality \mid \hookrightarrow\mkern-6.5muWalking)$

We adjusted for Age using the back-door criterion (Age blocks all back-door paths from Walking to Mortality, setting Walking= "intense" or conditioning on Walking="intense" has the same effect on Mortality)

Conclusion:

After adjusting for age, we obtain that 40.5% (43% unadjusted) of casual walkers died, whereas only 23.8% (21,5% unadjusted) of intense walkers died. The correlation induced by Age between the two variables is negligible.
Even after adjusting for all plausible confounders, after getting rid of all confounding bias, Walking is still associated to Mortality. Unless we missed any other confounders, in which case the remaining uncertainty is proportional to the correlation induced by these hidden variables, we can say that intentional walking prolongs life among the studied population.

In an observational study, adjusting for confounding factors is systematic in order to measure the causal effect of a treatment on an outcome.

Study measurements both unadjusted and age-adjusted:

title

In [ ]: