Logo
pyAgrum 1.8.1 on Jupyter

Tutorials

  • ▶ Tutorial
    • ▷ Tutorial
    • ▷ Tutorial2
  • ▶ Examples
    • ▷ Asthma
    • ▷ Kaggle Titanic
    • ▷ Naive Credit Default Modeling
    • ▷ Causality And Learning
    • ▷ Sensitivity Analysis Using Credal Networks
    • ▷ Quasi Continuous
    • ▷ Parameters Learning With Pandas
    • ▷ Bayesian Beta Coin
  • ▶ Models
    • ▷ Influence Diagram
    • ▷ Dynamic Bn
    • ▷ Markov Random Field
    • ▷ Credal Networks
    • ▷ O3PRM
  • ▶ Learning
    • ▷ Structural Learning
    • ▷ Learning Classifier
    • ▷ Learning And Essential Graphs
    • ▷ Dirichlet Prior And Weigthed Database
    • ▷ Parametric Em
    • ▷ Chi2 And Scores From Bn Learner
  • ▶ Inference
    • ▷ Graphical Inference
    • ▷ Relevance Reasoning
    • ▷ Lazy Propagation Advanced Features
    • ▷ Approximate Inference
    • ▷ Sampling Inference
  • ▶ Classifier
    • ▷ Learning
    • ▷ Discretizer
    • ▷ Compare Classifiers With Sklearn
    • ▷ Cross Validation
    • ▷ Binary And Nary Classifier From Bn
  • ▶ Causality
    • ▷ Tobacco
    • ▷ Simpson Paradox
    • ▷ Multinomial Simpson Paradox
    • ▷ Do Calculus Examples
    • ▷ Counterfactual
  • ▶ Applications
    • ▷ Ipywidgets
  • ▶ Tools
    • ▷ Potentials
    • ▷ Aggregators
    • ▷ Explain
    • ▷ Kl For BNs
    • ▷ Comparing Bn
    • ▷ Colouring And Exporting BNs
    • ▷ Config For PyAgrum
pyAgrum

Multinomial Simpson Paradox¶

this notebook shows a model for a multinomial Simpson paradox.

Creative Commons License aGrUM interactive online version
In [1]:
import matplotlib.pyplot as plt
import random

import pandas as pd
import numpy as np

import pyAgrum as gum
import pyAgrum.lib.notebook as gnb

import pyAgrum.causal as csl
import pyAgrum.causal.notebook as cslnb

Building the models¶

In [2]:
# building a model including a Simpson's paradox
def fillWithUniform(p,fmin=None,fmax=None):
  if fmin is None:
    vmin=0
  if fmax is None:
    vmax=p.variable(0).domainSize()-1

  mi=int(p.variable(0).numerical(0))
  ma=int(p.variable(0).numerical(p.variable(0).domainSize()-1))

  p.fillWith(0)

  I=gum.Instantiation(p)

  I.setFirst()
  while not I.end():
    vars={p.variable(i).name():p.variable(i).numerical(I.val(i)) for i in range(1,p.nbrDim())}
    if fmin is not None:
      vmin=int(eval(fmin,None,vars))
    if fmax is not None:
      vmax=int(eval(fmax,None,vars))
    if vmin<mi:
      vmin=mi
    if vmin>ma:
      vmin=ma
    if vmax<mi:
      vmax=mi
    if vmax>ma:
      vmax=ma

    for pos in range(vmin,vmax+1):
      I.chgVal(0,pos)
      p.set(I,1)
    I.incNotVar(p.variable(0))
  p.normalizeAsCPT()

size=70
sizeZ=5
bn=gum.fastBN(f"A[0,{size-1}]->B[0,{size-1}]<-C[0,{sizeZ-1}]->A")

bn.cpt("C").fillWith(1).normalize()
fillWithUniform(bn.cpt("A"),fmin="C*12",fmax="C*12+30")
bn.cpt("B").fillWithFunction("5+C*4-int(A/8)",[0.05,0.2,0.5,0.2,0.05]);
In [3]:
#  generating a CSV, taking this model as the causal one.
gum.generateSample(bn,400,"out/sample.csv")
df=pd.read_csv("out/sample.csv")
df.plot.scatter(x='A', y='B', c='C',colormap="tab20");
<rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <cc:Work> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:date>2023-05-24T15:15:21.256675</dc:date> <dc:format>image/svg+xml</dc:format> <dc:creator> <cc:Agent> <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title> </cc:Agent> </dc:creator> </cc:Work> </rdf:RDF> <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style>
In [4]:
cm=csl.CausalModel(bn)
_,p,_=csl.causalImpact(cm,on="B",doing="A")
In [5]:
# building an Markov-equivalent model, generating a CSV, taking this model as the causal one.
bn2=gum.BayesNet(bn)
bn2.reverseArc("C","A")

gum.generateSample(bn2,400,"out/sample2.csv")
df2=pd.read_csv("out/sample2.csv")

cm2=csl.CausalModel(bn2)
_,p2,_=csl.causalImpact(cm2,on="B",doing="A")

The observationnal model and its paradoxal structure (exactly the same with the second Markov-equivalent model)¶

In [6]:
gnb.flow.row(gnb.getBN(bn),
             df.plot.scatter(x='A', y='B'),
             df.plot.scatter(x='A', y='B', c='C',colormap="tab20"),
             captions=["the observationnal model","the trend is increasing","the trend is decreasing for any value for C !"])
gnb.flow.row(gnb.getBN(bn2),
             df2.plot.scatter(x='A', y='B'),
             df2.plot.scatter(x='A', y='B', c='C',colormap="tab20"),
             captions=["the Markov-equivalent model","the trend is increasing","the trend is decreasing for any value for C !"])
pyAgrum▶Causality▷Multinomial Simpson ParadoxG pyAgrum▶Causality▷Multinomial Simpson ParadoxB B pyAgrum▶Causality▷Multinomial Simpson ParadoxC C pyAgrum▶Causality▷Multinomial Simpson ParadoxC->B pyAgrum▶Causality▷Multinomial Simpson ParadoxA A pyAgrum▶Causality▷Multinomial Simpson ParadoxC->A pyAgrum▶Causality▷Multinomial Simpson ParadoxA->B
the observationnal model

the trend is increasing

the trend is decreasing for any value for C !
pyAgrum▶Causality▷Multinomial Simpson ParadoxG pyAgrum▶Causality▷Multinomial Simpson ParadoxB B pyAgrum▶Causality▷Multinomial Simpson ParadoxC C pyAgrum▶Causality▷Multinomial Simpson ParadoxC->B pyAgrum▶Causality▷Multinomial Simpson ParadoxA A pyAgrum▶Causality▷Multinomial Simpson ParadoxA->B pyAgrum▶Causality▷Multinomial Simpson ParadoxA->C
the Markov-equivalent model

the trend is increasing

the trend is decreasing for any value for C !

The paradox is revealed in the trend of the inferred means : the means are increasing with the value of $A$ except for any value of $C$ ...¶

In [7]:
for v in [10,20,30]:
  gnb.flow.add_html(gnb.getPosterior(bn,target="B",evs={"A":v}),f"$P(B|A={v})$")
gnb.flow.new_line()
for v in [10,20,30]:
  gnb.flow.add_html(gnb.getPosterior(bn,target="B",evs={"A":v,"C":0}),f"P(B | $A={v},C=0)$")
gnb.flow.new_line()
for v in [10,20,30]:
  gnb.flow.add_html(gnb.getPosterior(bn,target="B",evs={"A":v,"C":2}),f"P(B | $A={v},C=2$)")
gnb.flow.new_line()
for v in [10,20,30]:
  gnb.flow.add_html(gnb.getPosterior(bn,target="B",evs={"A":v,"C":4}),f"P(B | $A={v},C=4$)")
gnb.flow.display()

$P(B|A=10)$

$P(B|A=20)$

$P(B|A=30)$


P(B | $A=10,C=0)$

P(B | $A=20,C=0)$

P(B | $A=30,C=0)$


P(B | $A=10,C=2$)

P(B | $A=20,C=2$)