Logo
pyAgrum 1.8.1 on Jupyter

Tutorials

  • ▶ Tutorial
    • ▷ Tutorial
    • ▷ Tutorial2
  • ▶ Examples
    • ▷ Asthma
    • ▷ Kaggle Titanic
    • ▷ Naive Credit Default Modeling
    • ▷ Causality And Learning
    • ▷ Sensitivity Analysis Using Credal Networks
    • ▷ Quasi Continuous
    • ▷ Parameters Learning With Pandas
    • ▷ Bayesian Beta Coin
  • ▶ Models
    • ▷ Influence Diagram
    • ▷ Dynamic Bn
    • ▷ Markov Random Field
    • ▷ Credal Networks
    • ▷ O3PRM
  • ▶ Learning
    • ▷ Structural Learning
    • ▷ Learning Classifier
    • ▷ Learning And Essential Graphs
    • ▷ Dirichlet Prior And Weigthed Database
    • ▷ Parametric Em
    • ▷ Chi2 And Scores From Bn Learner
  • ▶ Inference
    • ▷ Graphical Inference
    • ▷ Relevance Reasoning
    • ▷ Lazy Propagation Advanced Features
    • ▷ Approximate Inference
    • ▷ Sampling Inference
  • ▶ Classifier
    • ▷ Learning
    • ▷ Discretizer
    • ▷ Compare Classifiers With Sklearn
    • ▷ Cross Validation
    • ▷ Binary And Nary Classifier From Bn
  • ▶ Causality
    • ▷ Tobacco
    • ▷ Simpson Paradox
    • ▷ Multinomial Simpson Paradox
    • ▷ Do Calculus Examples
    • ▷ Counterfactual
  • ▶ Applications
    • ▷ Ipywidgets
  • ▶ Tools
    • ▷ Potentials
    • ▷ Aggregators
    • ▷ Explain
    • ▷ Kl For BNs
    • ▷ Comparing Bn
    • ▷ Colouring And Exporting BNs
    • ▷ Config For PyAgrum
pyAgrum

Kaggle Titanic¶

Creative Commons License aGrUM interactive online version
In [1]:
import pandas
import os
import math
import pyAgrum as gum
import pyAgrum.lib.notebook as gnb
from pyAgrum.lib.bn2roc import showROC_PR

from sklearn.metrics import accuracy_score, roc_auc_score, confusion_matrix
import pandas as pd

Titanic: Machine Learning from Disaster¶

This notebook is an introduction to the Kaggle titanic challenge. The goal here is not to produce the best possible classifier, at least not yet, but to show how pyAgrum and Bayesian networks can be used to easily and quickly explore and understand data.

To undestand this notebook, basic knowledge of Bayesian networks is required. If you are looking for an introduction to pyAgrum, check this notebook.

This notebook present three different Bayesien Networks techniques to answer the Kaggle Titanic challenge. The first approach we will answer the challenge without using the training set and we will only use our prior knowledge about shipwrecks. In the second approach we will only use the training set with pyAgrum's machine learning algorithms. Finally, in the third approach we will use both prior knowledge about shipwrecks and machine learning.

Before we start, some disclaimers about aGrUM and pyAgrum.

aGrUM is a C++ library designed for easily building applications using graphical models such as Bayesian networks, influence diagrams, decision trees or Markov decision processes.

pyAgrum is a Python wrapper for the C++ aGrUM library. It provides a high-level interface to the part of aGrUM allowing to create, handle and make computations into Bayesian networks. The module mainly is an application of the SWIG interface generator. Custom-written code is added to simplify and extend the aGrUM API.

Both projects are open source and can be freely downloaded from aGrUM's gitlab repository or installed using pip or anaconda.

If you have questions, remarks or suggestions, feel free to ask us on info@agrum.org.

Pretreatment¶

We will be using pandas to setup the learning data to fit with pyAgrum requirements.

In [2]:
traindf=pandas.read_csv('res/titanic/train.csv')                       

testdf=pandas.merge(pandas.read_csv('res/titanic/test.csv'),
                   pandas.read_csv('res/titanic/gender_submission.csv'),
                   on="PassengerId")

This merges both the test base with the fact that a passager has survived or not.

In [3]:
for k in traindf.keys():
    print(f'{k}: {len(traindf[k].unique())}')
PassengerId: 891
Survived: 2
Pclass: 3
Name: 891
Sex: 2
Age: 89
SibSp: 7
Parch: 7
Ticket: 681
Fare: 248
Cabin: 148
Embarked: 4

Looking at the number of unique values for each variable is necessary since Bayesian networks are discrete models. We will want to reduce the domain size of some discrete varaibles (like age) and discretize continuous variables (like Fare).

For starters you can filter out variables with a large number of values. Choosing a large number will have an impact on performances, which boils down to how much CPU and RAM you have at your disposal. Here, we choose to filter out any variable with more than 10 different outcomes.

In [4]:
for k in traindf.keys():
    if len(traindf[k].unique())<=15:
        print(k)
Survived
Pclass
Sex
SibSp
Parch
Embarked

This leaves us with 6 variables, not much but still enough to learn a Bayesian network. Will just add one more variable by reducing the cardinality of the Age variable.

In [5]:
testdf=pandas.merge(pandas.read_csv('res/titanic/test.csv'),
                    pandas.read_csv('res/titanic/gender_submission.csv'),
                    on="PassengerId")


def forAge(row):
    try:
        age = float(row['Age'])
        if age < 1:
            #return '[0;1['
            return 'baby'
        elif age < 6:
            #return '[1;6['
            return 'toddler'
        elif age < 12:
            #return '[6;12['
            return 'kid'
        elif age < 21:
            #return '[12;21['
            return 'teen'
        elif age < 80:
            #return '[21;80['
            return 'adult'
        else:
            #return '[80;200]'
            return 'old'
    except ValueError:
        return np.nan
    
def forBoolean(row, col):
    try:
        val = int(row[col])
        if row[col] >= 1:
            return "True"
        else:
            return "False"
    except ValueError:
        return "False"
    
def forGender(row):
    if row['Sex'] == "male":
        return "Male"
    else:
        return "Female"
        

testdf
Out[5]:
PassengerId Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked Survived
0 892 3 Kelly, Mr. James male 34.5 0 0 330911 7.8292 NaN Q 0
1 893 3 Wilkes, Mrs. James (Ellen Needs) female 47.0 1 0 363272 7.0000 NaN S 1
2 894 2 Myles, Mr. Thomas Francis male 62.0 0 0 240276 9.6875 NaN Q 0
3 895 3 Wirz, Mr. Albert male 27.0 0 0 315154 8.6625 NaN S 0
4 896 3 Hirvonen, Mrs. Alexander (Helga E Lindqvist) female 22.0 1 1 3101298 12.2875 NaN S 1
... ... ... ... ... ... ... ... ... ... ... ... ...
413 1305 3 Spector, Mr. Woolf male NaN 0 0 A.5. 3236 8.0500 NaN S 0
414 1306 1 Oliva y Ocana, Dona. Fermina female 39.0 0 0 PC 17758 108.9000 C105 C 1
415 1307 3 Saether, Mr. Simon Sivertsen male 38.5 0 0 SOTON/O.Q. 3101262 7.2500 NaN S 0
416 1308 3 Ware, Mr. Frederick male NaN 0 0 359309 8.0500 NaN S 0
417 1309 3 Peter, Master. Michael J male NaN 1 1 2668 22.3583 NaN C 0

418 rows × 12 columns

When pretreating data, you will want to wrap your changes inside a function, this will help you keep track of your changes and easily compare them.

In [6]:
def pretreat(df):
    if 'Survived' in df.columns:
        df['Survived'] = df.apply(lambda row: forBoolean(row, 'Survived'), axis=1)
    df['Age'] = df.apply(forAge, axis=1)
    df['SibSp'] = df.apply(lambda row: forBoolean(row, 'SibSp'), axis=1)
    df['Parch'] = df.apply(lambda row: forBoolean(row, 'Parch'), axis=1)
    df['Sex'] = df.apply(forGender, axis=1)
    droped_cols = [col for col in ['PassengerId', 'Name', 'Ticket', 'Fare', 'Cabin'] if col in df.columns]
    df = df.drop(droped_cols, axis=1)
    df = df.rename(index=str, columns={'Sex': 'Gender', 'SibSp': 'Siblings', 'Parch': 'Parents'})
    df.dropna(inplace=True)
    return df

traindf = pandas.read_csv('res/titanic/train.csv')
testdf  = pandas.merge(pandas.read_csv('res/titanic/test.csv'),
                       pandas.read_csv('res/titanic/gender_submission.csv'),
                       on="PassengerId")

traindf = pretreat(traindf)
testdf = pretreat(testdf)

We will need to save this intermediate learning database, since pyAgrum accepts only files as inputs. As a rule of thumb, save your CSV using comma as separators and do not quote values when you plan to use them with pyAgrum.

In [7]:
import csv
traindf.to_csv('res/titanic/post_train.csv', index=False)
testdf.to_csv('res/titanic/post_test.csv', index=False)

Modeling withtout learning¶

In some cases, we might not have any data to learn from. In such cases, we can rely on experts to provide correlation between variables and conditional probabilities.

It can be simpler to start with a simple topography, leaving room to add more complexe correlations as the model is confonted aginst data. Here, we will use three hypothesis:

  • All variables are independent conditionnaly to each other given the fact that a passenger has survive or not.
  • Women and children are more likelly to survive.
  • The more sibling or parents abord, the less likelly the passenger will survive.

The first assumption results in the following DAG for our Bayesian network:

In [8]:
bn = gum.BayesNet("Surviving Titanic")
bn =gum.fastBN("Age{baby|toddler|kid|teen|adult|old}<-Survived{False|True}->Gender{Female|Male};Siblings{False|True}<-Survived->Parents{False|True}")
print(bn.variable("Survived"))
print(bn.variable("Age"))
print(bn.variable("Gender"))
print(bn.variable("Siblings"))
print(bn.variable("Parents"))

bn
Survived:Labelized({False|True})
Age:Labelized({baby|toddler|kid|teen|adult|old})
Gender:Labelized({Female|Male})
Siblings:Labelized({False|True})
Parents:Labelized({False|True})
Out[8]:
pyAgrum▶Examples▷Kaggle TitanicG pyAgrum▶Examples▷Kaggle TitanicSiblings Siblings pyAgrum▶Examples▷Kaggle TitanicSurvived Survived pyAgrum▶Examples▷Kaggle TitanicSurvived->Siblings pyAgrum▶Examples▷Kaggle TitanicGender Gender pyAgrum▶Examples▷Kaggle TitanicSurvived->Gender pyAgrum▶Examples▷Kaggle TitanicAge Age pyAgrum▶Examples▷Kaggle TitanicSurvived->Age pyAgrum▶Examples▷Kaggle TitanicParents Parents pyAgrum▶Examples▷Kaggle TitanicSurvived->Parents

Hypothesis two and three can help us define the parameters for this Bayesian network. Remember that we assume that we do not have any data to learn from. So we will use simple definition such as "a women is 10 times more likeliy to survive than a man". We can then normalize the values to obtain a proper conditional probability distribution.

This technique may not be the most precise or scientifically sounded, it however has the advantage to be easy to use.

In [9]:
bn.cpt('Survived')[:] = [100, 1]
bn.cpt('Survived').normalizeAsCPT()
bn.cpt('Survived')
Out[9]:
Survived
False
True
0.99010.0099
In [10]:
bn.cpt('Age')[{'Survived':0}] = [ 1, 1, 1, 10, 10, 1]
bn.cpt('Age')[{'Survived':1}] = [ 10, 10, 10, 1, 1, 10]
bn.cpt('Age').normalizeAsCPT()
bn.cpt('Age')
Out[10]:
Age
Survived
baby
toddler
kid
teen
adult
old
False
0.04170.04170.04170.41670.41670.0417
True
0.23810.23810.23810.02380.02380.2381
In [11]:
bn.cpt('Gender')[{'Survived':0}] = [ 1, 1]
bn.cpt('Gender')[{'Survived':1}] = [ 10, 1]
bn.cpt('Gender').normalizeAsCPT()
bn.cpt('Gender')
Out[11]:
Gender
Survived
Female
Male
False
0.50000.5000
True
0.90910.0909
In [12]:
bn.cpt('Siblings')[{'Survived':0}] = [ 1, 10]
bn.cpt('Siblings')[{'Survived':1}] = [ 10, 1]
bn.cpt('Siblings').normalizeAsCPT()
bn.cpt('Siblings')
Out[12]:
Siblings
Survived
False
True
False
0.09090.9091
True
0.90910.0909
In [13]:
bn.cpt('Parents')[{'Survived':0}] = [ 1, 10]
bn.cpt('Parents')[{'Survived':1}] = [ 10, 1]
bn.cpt('Parents').normalizeAsCPT()
bn.cpt('Parents')
Out[13]:
Parents
Survived
False
True
False
0.09090.9091
True
0.90910.0909

Now we can start using the Bayesian network and check that our hypothesis hold.

In [14]:
gnb.showInference(bn,size="10")
pyAgrum▶Examples▷Kaggle Titanicstructs Inference in   0.35ms pyAgrum▶Examples▷Kaggle TitanicAge <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <cc:Work> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:date>2023-05-24T14:47:00.806861</dc:date> <dc:format>image/svg+xml</dc:format> <dc:creator> <cc:Agent> <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title> </cc:Agent> </dc:creator> </cc:Work> </rdf:RDF> <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style> pyAgrum▶Examples▷Kaggle TitanicSurvived <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <cc:Work> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:date>2023-05-24T14:47:00.867957</dc:date> <dc:format>image/svg+xml</dc:format> <dc:creator> <cc:Agent> <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title> </cc:Agent> </dc:creator> </cc:Work> </rdf:RDF> <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style> pyAgrum▶Examples▷Kaggle TitanicSurvived->Age pyAgrum▶Examples▷Kaggle TitanicGender <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <cc:Work> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:date>2023-05-24T14:47:00.921147</dc:date> <dc:format>image/svg+xml</dc:format> <dc:creator> <cc:Agent> <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title> </cc:Agent> </dc:creator> </cc:Work> </rdf:RDF> <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style> pyAgrum▶Examples▷Kaggle TitanicSurvived->Gender pyAgrum▶Examples▷Kaggle TitanicSiblings <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <cc:Work> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:date>2023-05-24T14:47:00.972870</dc:date> <dc:format>image/svg+xml</dc:format> <dc:creator> <cc:Agent> <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title> </cc:Agent> </dc:creator> </cc:Work> </rdf:RDF> <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style> pyAgrum▶Examples▷Kaggle TitanicSurvived->Siblings pyAgrum▶Examples▷Kaggle TitanicParents <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <cc:Work> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:date>2023-05-24T14:47:01.025599</dc:date> <dc:format>image/svg+xml</dc:format> <dc:creator> <cc:Agent> <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title> </cc:Agent> </dc:creator> </cc:Work> </rdf:RDF> <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style> pyAgrum▶Examples▷Kaggle TitanicSurvived->Parents

We can see here that most passengers (99% of them) will not survive and that we have almost as much women (50.4%) as men (49.6%). The majority of passengers are either teenagers or adults. Finally, most passenger had siblings or parents aboard.

Recall that we have not use any data to learn the Bayesian Netork's parameters and our expert did not have any knowledge about the passengers aboard the Titanic.

In [15]:
gnb.showInference(bn,size="10", evs={'Survived':'False'})
gnb.showInference(bn,size="10", evs={'Survived':'True'})
pyAgrum▶Examples▷Kaggle Titanicstructs Inference in   0.35ms pyAgrum▶Examples▷Kaggle TitanicAge <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <cc:Work> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:date>2023-05-24T14:47:01.240577</dc:date> <dc:format>image/svg+xml</dc:format> <dc:creator> <cc:Agent> <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title> </cc:Agent> </dc:creator> </cc:Work> </rdf:RDF> <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style> pyAgrum▶Examples▷Kaggle TitanicSurvived <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <cc:Work> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:date>2023-05-24T14:47:01.301408</dc:date> <dc:format>image/svg+xml</dc:format> <dc:creator> <cc:Agent> <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title> </cc:Agent> </dc:creator> </cc:Work> </rdf:RDF> <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style> pyAgrum▶Examples▷Kaggle TitanicSurvived->Age pyAgrum▶Examples▷Kaggle TitanicGender <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <cc:Work> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:date>2023-05-24T14:47:01.352024</dc:date> <dc:format>image/svg+xml</dc:format> <dc:creator> <cc:Agent> <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title> </cc:Agent> </dc:creator> </cc:Work> </rdf:RDF> <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style> pyAgrum▶Examples▷Kaggle TitanicSurvived->Gender pyAgrum▶Examples▷Kaggle TitanicSiblings <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <cc:Work> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:date>2023-05-24T14:47:01.403662</dc:date> <dc:format>image/svg+xml</dc:format> <dc:creator> <cc:Agent> <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title> </cc:Agent> </dc:creator> </cc:Work> </rdf:RDF> <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style> pyAgrum▶Examples▷Kaggle TitanicSurvived->Siblings pyAgrum▶Examples▷Kaggle TitanicParents <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <cc:Work> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:date>2023-05-24T14:47:01.546758</dc:date> <dc:format>image/svg+xml</dc:format> <dc:creator> <cc:Agent> <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title> </cc:Agent> </dc:creator> </cc:Work> </rdf:RDF> <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style> pyAgrum▶Examples▷Kaggle TitanicSurvived->Parents
pyAgrum▶Examples▷Kaggle Titanicstructs Inference in   0.27ms pyAgrum▶Examples▷Kaggle TitanicAge <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <cc:Work> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:date>2023-05-24T14:47:01.716389</dc:date> <dc:format>image/svg+xml</dc:format> <dc:creator> <cc:Agent> <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title> </cc:Agent> </dc:creator> </cc:Work> </rdf:RDF> <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style> pyAgrum▶Examples▷Kaggle TitanicSurvived <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <cc:Work> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:date>2023-05-24T14:47:01.776095</dc:date> <dc:format>image/svg+xml</dc:format> <dc:creator> <cc:Agent> <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title> </cc:Agent> </dc:creator> </cc:Work> </rdf:RDF> <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style> pyAgrum▶Examples▷Kaggle TitanicSurvived->Age pyAgrum▶Examples▷Kaggle TitanicGender <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <cc:Work> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:date>2023-05-24T14:47:01.827627</dc:date> <dc:format>image/svg+xml</dc:format> <dc:creator> <cc:Agent> <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title> </cc:Agent> </dc:creator> </cc:Work> </rdf:RDF> <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style> pyAgrum▶Examples▷Kaggle TitanicSurvived->Gender pyAgrum▶Examples▷Kaggle TitanicSiblings <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <cc:Work> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:date>2023-05-24T14:47:01.879888</dc:date> <dc:format>image/svg+xml</dc:format> <dc:creator> <cc:Agent> <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title> </cc:Agent> </dc:creator> </cc:Work> </rdf:RDF> <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style> pyAgrum▶Examples▷Kaggle TitanicSurvived->Siblings pyAgrum▶Examples▷Kaggle TitanicParents <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <cc:Work> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:date>2023-05-24T14:47:01.934976</dc:date> <dc:format>image/svg+xml</dc:format> <dc:creator> <cc:Agent> <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title> </cc:Agent> </dc:creator> </cc:Work> </rdf:RDF> <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style> pyAgrum▶Examples▷Kaggle TitanicSurvived->Parents

Here, we can see that our second and third hypothesis hold since when we enter envidence that a passenger survived, it is more likely to be a woman with no siblings or parents. On the contrary, if we observe that a passenger did not survive we can see that it is more likely to be a man with siblings or parents.

In [16]:
gnb.showInference(bn,size="10", evs={'Survived':'True', 'Gender':'Male'})
gnb.showInference(bn,size="10", evs={'Gender':'Male'})
pyAgrum▶Examples▷Kaggle Titanicstructs Inference in   0.42ms pyAgrum▶Examples▷Kaggle TitanicAge <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <cc:Work> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:date>2023-05-24T14:47:02.161881</dc:date> <dc:format>image/svg+xml</dc:format> <dc:creator> <cc:Agent> <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title> </cc:Agent> </dc:creator> </cc:Work> </rdf:RDF> <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style> pyAgrum▶Examples▷Kaggle TitanicSurvived <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <cc:Work> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:date>2023-05-24T14:47:02.220620</dc:date> <dc:format>image/svg+xml</dc:format> <dc:creator> <cc:Agent> <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title> </cc:Agent> </dc:creator> </cc:Work> </rdf:RDF> <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style> pyAgrum▶Examples▷Kaggle TitanicSurvived->Age pyAgrum▶Examples▷Kaggle TitanicGender <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <cc:Work> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:date>2023-05-24T14:47:02.270601</dc:date> <dc:format>image/svg+xml</dc:format> <dc:creator> <cc:Agent> <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title> </cc:Agent> </dc:creator> </cc:Work> </rdf:RDF> <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style> pyAgrum▶Examples▷Kaggle TitanicSurvived->Gender pyAgrum▶Examples▷Kaggle TitanicSiblings <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <cc:Work> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:date>2023-05-24T14:47:02.320825</dc:date> <dc:format>image/svg+xml</dc:format> <dc:creator> <cc:Agent> <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title> </cc:Agent> </dc:creator> </cc:Work> </rdf:RDF> <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style> pyAgrum▶Examples▷Kaggle TitanicSurvived->Siblings pyAgrum▶Examples▷Kaggle TitanicParents <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <cc:Work> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:date>2023-05-24T14:47:02.387322</dc:date> <dc:format>image/svg+xml</dc:format> <dc:creator> <cc:Agent> <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title> </cc:Agent> </dc:creator> </cc:Work> </rdf:RDF> <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style> pyAgrum▶Examples▷Kaggle TitanicSurvived->Parents
pyAgrum▶Examples▷Kaggle Titanicstructs Inference in   0.29ms pyAgrum▶Examples▷Kaggle TitanicAge <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <cc:Work> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:date>2023-05-24T14:47:02.647360</dc:date> <dc:format>image/svg+xml</dc:format> <dc:creator> <cc:Agent> <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title> </cc:Agent> </dc:creator> </cc:Work> </rdf:RDF> <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style> pyAgrum▶Examples▷Kaggle TitanicSurvived <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <cc:Work> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:date>2023-05-24T14:47:02.705990</dc:date> <dc:format>image/svg+xml</dc:format> <dc:creator> <cc:Agent> <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title> </cc:Agent> </dc:creator> </cc:Work> </rdf:RDF> <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style> pyAgrum▶Examples▷Kaggle TitanicSurvived->Age pyAgrum▶Examples▷Kaggle TitanicGender <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <cc:Work> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:date>2023-05-24T14:47:02.755481</dc:date> <dc:format>image/svg+xml</dc:format> <dc:creator> <cc:Agent> <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title> </cc:Agent> </dc:creator> </cc:Work> </rdf:RDF> <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style> pyAgrum▶Examples▷Kaggle TitanicSurvived->Gender pyAgrum▶Examples▷Kaggle TitanicSiblings <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <cc:Work> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:date>2023-05-24T14:47:02.806482</dc:date> <dc:format>image/svg+xml</dc:format> <dc:creator> <cc:Agent> <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title> </cc:Agent> </dc:creator> </cc:Work> </rdf:RDF> <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style> pyAgrum▶Examples▷Kaggle TitanicSurvived->Siblings pyAgrum▶Examples▷Kaggle TitanicParents <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <cc:Work> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:date>2023-05-24T14:47:02.856417</dc:date> <dc:format>image/svg+xml</dc:format> <dc:creator> <cc:Agent> <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title> </cc:Agent> </dc:creator> </cc:Work> </rdf:RDF> <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style> pyAgrum▶Examples▷Kaggle TitanicSurvived->Parents

This validates our first hypothesis: if we know that a passenger survived or not, then evidence about that passenger does not changes our belief about other variables. On the contrary, if we do not know if a passenger survived, then evidence about the passenger will change our belief about other variables, including the fact that he or she survived or not.

In [17]:
ie=gum.LazyPropagation(bn)

def init_belief(engine):
    # Initialize evidence
    for var