Click here to hide/show the list of notebooks.
  pyAgrum on notebooks   pyAgrum jupyter
☰  structuralLearning 
pyAgrum 0.16.2   
Zipped notebooks   
generation: 2019-10-02 10:58  

Creative Commons License
This pyAgrum's notebook is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

In [1]:
%matplotlib inline
from pylab import *
import matplotlib.pyplot as plt

import os
In [2]:
import pyAgrum as gum
import pyAgrum.lib.notebook as gnb
gum.about()
gnb.configuration()
pyAgrum version 0.16.0.9
(c) Pierre-Henri Wuillemin, Christophe Gonzales, Lionel Torti
    UPMC 2015

    This is free software; see the source code for copying conditions.
    There is ABSOLUTELY NO WARRANTY; not even for MERCHANTABILITY or
    FITNESS FOR A PARTICULAR PURPOSE.  For details, see 'pyAgrum.warranty'.
    
LibraryVersion
OSposix [darwin]
Python3.7.3 (default, Mar 27 2019, 09:23:15) [Clang 10.0.1 (clang-1001.0.46.3)]
IPython7.8.0
MatPlotLib3.1.1
Numpy1.17.2
pyAgrum0.16.0.9
Sun Sep 15 11:49:07 2019 CEST

Generating the database from a BN

In [3]:
bn=gum.loadBN(os.path.join("res","asia.bif"))
bn
Out[3]:
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? visit_to_Asia?->tuberculosis? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? lung_cancer? lung_cancer? lung_cancer?->tuberculos_or_cancer? smoking? smoking? smoking?->lung_cancer? bronchitis? bronchitis? smoking?->bronchitis? bronchitis?->dyspnoea?
In [4]:
gum.generateCSV(bn,os.path.join("out","sample_asia.csv"),500000,True)
 out/sample_asia.csv : [ ##################################################### ] 100%ut/sample_asia.csv : [ ##################################                    ] 64%
Log2-Likelihood : -1612350.3762816712
Out[4]:
-1612350.3762816712
In [5]:
import pyAgrum.lib._utils.oslike as oslike
print("===\n  Size of the generated database\n===")
oslike.wc_l(os.path.join("out","sample_asia.csv"))
print("\n===\n  First lines\n===")
oslike.head(os.path.join("out","sample_asia.csv"))
===
  Size of the generated database
===
500000

===
  First lines
===
dyspnoea?,positive_XraY?,tuberculos_or_cancer?,visit_to_Asia?,tuberculosis?,smoking?,lung_cancer?,bronchitis?
1,1,1,1,1,1,1,1
1,1,1,1,1,1,1,1
0,1,1,1,1,0,1,0
0,1,1,1,1,0,1,1
0,1,1,1,1,0,1,0
1,0,1,1,1,1,1,0
0,0,1,1,1,0,1,0
1,1,1,1,1,0,1,0
0,1,1,1,1,0,1,0

In [6]:
learner=gum.BNLearner(os.path.join("out","sample_asia.csv"),bn) #using bn as template for variables
learner.names()
Out[6]:
('visit_to_Asia?',
 'tuberculosis?',
 'tuberculos_or_cancer?',
 'positive_XraY?',
 'lung_cancer?',
 'smoking?',
 'bronchitis?',
 'dyspnoea?')
In [7]:
learner.idFromName('visit_to_Asia?') # first row is 0
Out[7]:
0
In [8]:
learner.nameFromId(4)
Out[8]:
'lung_cancer?'

The BNLearner is capable of recognizing missing values in databases. For this purpose, just indicate as a last argument the list of the strings that represent missing values. Note that, currently, the BNLearner is not yet able to learn in the presence of missing values. This is the reason why, when it discovers that there exist such values, it raises a gum.MissingValueInDatabase exception.

In [9]:
# it is possible to add as a last argument a list of the symbols that represent missing values:
# whenever a cell of the database is equal to one of these strings, it is considered as a 
# missing value
learner=gum.BNLearner(os.path.join("out","sample_asia.csv"),bn, ['?', 'N/A'] )
In [10]:
oslike.head(os.path.join("res","asia_missing.csv"))

try:
    learner=gum.BNLearner(os.path.join("res","asia_missing.csv"),bn, ['?', 'N/A'] )
except gum.MissingValueInDatabase:
    print ( "exception raised: there are missing values in the database" )
smoking?,lung_cancer?,bronchitis?,visit_to_Asia?,tuberculosis?,tuberculos_or_cancer?,dyspnoea?,positive_XraY?
0,0,0,1,1,0,0,0
1,1,0,1,1,1,0,1
1,1,1,1,1,1,1,1
1,1,0,1,1,1,0,N/A
0,1,0,1,1,1,1,1
1,1,1,1,1,1,1,1
1,1,1,1,1,1,0,1
1,1,0,1,1,1,0,1
1,1,1,1,1,1,1,1

Parameters learning from the database

We give the $bn$ as a parameter for the learner in order to have the variables and the order of the labels for each variables. Please try to remove the argument $bn$ in the first line below to see the difference ...

In [11]:
learner=gum.BNLearner(os.path.join("out","sample_asia.csv"),bn) #using bn as template for variables and labels
learner.setInitialDAG(bn.dag())
bn2=learner.learnParameters()
gnb.showBN(bn2)
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? visit_to_Asia?->tuberculosis? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? lung_cancer? lung_cancer? lung_cancer?->tuberculos_or_cancer? smoking? smoking? smoking?->lung_cancer? bronchitis? bronchitis? smoking?->bronchitis? bronchitis?->dyspnoea?
In [12]:
from IPython.display import HTML

HTML('<table><tr><td style="text-align:center;"><h3>original BN</h3></td>'+
     '<td style="text-align:center;"><h3>Learned BN</h3></td></tr>'+
     '<tr><td><center>'+
     gnb.getPotential(bn.cpt (bn.idFromName('visit_to_Asia?')))
     +'</center></td><td><center>'+
     gnb.getPotential(bn2.cpt(bn2.idFromName('visit_to_Asia?')))
     +'</center></td></tr><tr><td><center>'+
     gnb.getPotential(bn.cpt (bn.idFromName('tuberculosis?')))
     +'</center></td><td><center>'+
     gnb.getPotential(bn2.cpt(bn2.idFromName('tuberculosis?')))
     +'</center></td></tr></table>')
Out[12]:

original BN

Learned BN

visit_to_Asia?
0
1
0.01000.9900
visit_to_Asia?
0
1
0.01000.9900
tuberculosis?
visit_to_Asia?
0
1
0
0.05000.9500
1
0.01000.9900
tuberculosis?
visit_to_Asia?
0
1
0
0.04490.9551
1
0.01000.9900

Structural learning a BN from the database

Different learning algorithms

For now, there are three algorithms that are wrapped in pyAgrum : LocalSearchWithTabuList,

In [13]:
learner=gum.BNLearner(os.path.join("out","sample_asia.csv"),bn) #using bn as template for variables
learner.useLocalSearchWithTabuList()
bn2=learner.learnBN()
print("Learned in {0}ms".format(1000*learner.currentTime()))
gnb.sideBySide(bn2,gnb.getInformation(bn2))
kl=gum.BruteForceKL(bn,bn2)
kl.compute()
Learned in 212.456ms
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? tuberculosis?->visit_to_Asia? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? lung_cancer? lung_cancer? tuberculosis?->lung_cancer? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? tuberculos_or_cancer?->lung_cancer? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? smoking? smoking? lung_cancer?->smoking? bronchitis? bronchitis? smoking?->bronchitis? bronchitis?->dyspnoea?
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? tuberculosis?->visit_to_Asia? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? lung_cancer? lung_cancer? tuberculosis?->lung_cancer? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? tuberculos_or_cancer?->lung_cancer? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? smoking? smoking? lung_cancer?->smoking? bronchitis? bronchitis? smoking?->bronchitis? bronchitis?->dyspnoea?
0.080640489770729210.9999949113865898
** pyAgrum.BruteForceKL is deprecated in pyAgrum>0.12.6.
** A pyAgrum.ExactBNdistance has been created.
Out[13]:
{'klPQ': 3.525227924387601e-05,
 'errorPQ': 0,
 'klQP': 3.1608628001091694e-05,
 'errorQP': 128,
 'hellinger': 0.0035474024695537586,
 'bhattacharya': 6.286410033769014e-06,
 'jensen-shannon': 8.85466266265196e-06}

A greedy Hill Climbing algorithm (with insert, remove and change arc as atomic operations).

In [14]:
learner=gum.BNLearner(os.path.join("out","sample_asia.csv"),bn) #using bn as template for variables
learner.useGreedyHillClimbing()
bn2=learner.learnBN()
print("Learned in {0}ms".format(1000*learner.currentTime()))
gnb.sideBySide(bn2,gnb.getInformation(bn2))
Learned in 231.39ms
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? tuberculosis?->visit_to_Asia? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? lung_cancer? lung_cancer? tuberculosis?->lung_cancer? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? tuberculos_or_cancer?->lung_cancer? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? smoking? smoking? lung_cancer?->smoking? bronchitis? bronchitis? smoking?->bronchitis? bronchitis?->dyspnoea?
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? tuberculosis?->visit_to_Asia? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? lung_cancer? lung_cancer? tuberculosis?->lung_cancer? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? tuberculos_or_cancer?->lung_cancer? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? smoking? smoking? lung_cancer?->smoking? bronchitis? bronchitis? smoking?->bronchitis? bronchitis?->dyspnoea?
0.080640489770729210.9999949113865898

And a K2 for those who likes it :)

In [15]:
learner=gum.BNLearner(os.path.join("out","sample_asia.csv"),bn) #using bn as template for variables
learner.useK2([0,1,2,3,4,5,6,7])
bn2=learner.learnBN()
print("Learned in {0}ms".format(1000*learner.currentTime()))
bn2
Learned in 98.785ms
Out[15]:
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? visit_to_Asia?->tuberculosis? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? lung_cancer? lung_cancer? tuberculosis?->lung_cancer? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? tuberculos_or_cancer?->lung_cancer? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? smoking? smoking? lung_cancer?->smoking? bronchitis? bronchitis? smoking?->bronchitis? bronchitis?->dyspnoea?

K2 can be very good if the order is the good one (a topological order of nodes in the reference)

In [16]:
learner=gum.BNLearner(os.path.join("out","sample_asia.csv"),bn) #using bn as template for variables
learner.useK2([7,6,5,4,3,2,1,0])
bn2=learner.learnBN()
print("Learned in {0}s".format(learner.currentTime()))
bn2
Learned in 0.192506s
Out[16]:
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? tuberculosis?->visit_to_Asia? tuberculos_or_cancer? tuberculos_or_cancer? tuberculos_or_cancer?->tuberculosis? positive_XraY? positive_XraY? positive_XraY?->tuberculos_or_cancer? lung_cancer? lung_cancer? lung_cancer?->tuberculosis? lung_cancer?->tuberculos_or_cancer? lung_cancer?->positive_XraY? smoking? smoking? smoking?->lung_cancer? bronchitis? bronchitis? bronchitis?->tuberculos_or_cancer? bronchitis?->positive_XraY? bronchitis?->lung_cancer? bronchitis?->smoking? dyspnoea? dyspnoea? dyspnoea?->tuberculos_or_cancer? dyspnoea?->positive_XraY? dyspnoea?->lung_cancer? dyspnoea?->smoking? dyspnoea?->bronchitis?

Following the learning curve

In [17]:
import numpy as np
%matplotlib inline

learner=gum.BNLearner(os.path.join("out","sample_asia.csv"),bn) #using bn as template for variables
learner.useLocalSearchWithTabuList()

# we could prefere a log2likelihood score
# learner.useScoreLog2Likelihood()
learner.setMaxTime(10)

# representation of the error as a pseudo log (negative values really represents negative epsilon
@np.vectorize
def pseudolog(x):
    seuil=2.0
    y=-x if x<0 else x
        
    if y<seuil:
        res=y*np.log10(seuil)/seuil
    else:
        res=np.log10(y)
        
    return res if x>0 else -res

# in order to control the complexity, we limit the number of parents
learner.setMaxIndegree(3) # no more than 3 parent by node
gnb.animApproximationScheme(learner,
                            scale=pseudolog) # scale by default is np.log10

bn2=learner.learnBN()

Customizing the learning algorithms

1. Learn a tree ?

In [18]:
learner=gum.BNLearner(os.path.join("out","sample_asia.csv"),bn) #using bn as template for variables
learner.useGreedyHillClimbing()

learner.setMaxIndegree(1) # no more than 1 parent by node

bntree=learner.learnBN()
bntree
Out[18]:
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? tuberculosis?->visit_to_Asia? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? lung_cancer? lung_cancer? tuberculos_or_cancer?->lung_cancer? bronchitis? bronchitis? lung_cancer?->bronchitis? smoking? smoking? bronchitis?->smoking? dyspnoea? dyspnoea? bronchitis?->dyspnoea?

2. with prior structural knowledge

In [19]:
learner=gum.BNLearner(os.path.join("out","sample_asia.csv"),bn) #using bn as template for variables
learner.useGreedyHillClimbing()

# I know that smoking causes cancer
learner.addMandatoryArc("smoking?","lung_cancer?") # smoking->lung_cancer
# I know that visit to Asia may change the risk of tuberculosis
learner.addMandatoryArc("visit_to_Asia?","tuberculosis?") # visit_to_Asia->tuberculosis

bn2=learner.learnBN()
gnb.showBN(bn2,size="5")
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? visit_to_Asia?->tuberculosis? tuberculos_or_cancer? tuberculos_or_cancer? tuberculos_or_cancer?->visit_to_Asia? tuberculos_or_cancer?->tuberculosis? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? lung_cancer? lung_cancer? lung_cancer?->visit_to_Asia? lung_cancer?->tuberculosis? lung_cancer?->tuberculos_or_cancer? smoking? smoking? smoking?->lung_cancer? bronchitis? bronchitis? bronchitis?->tuberculos_or_cancer? bronchitis?->lung_cancer? bronchitis?->smoking? dyspnoea? dyspnoea? bronchitis?->dyspnoea? dyspnoea?->tuberculos_or_cancer? dyspnoea?->lung_cancer? dyspnoea?->smoking?

3. comparing BNs

In [20]:
help(gnb.getBNDiff)
Help on function getBNDiff in module pyAgrum.lib.notebook:

getBNDiff(bn1, bn2, size=None)
    get a HTML string representation of a graphical diff between the arcs of _bn1 (reference) with those of _bn2.
    
    * full black line: the arc is common for both
    * full red line: the arc is common but inverted in _bn2
    * dotted black line: the arc is added in _bn2
    * dotted red line: the arc is removed in _bn2
    
    :param BayesNet bn1: referent model for the comparison
    :param BayesNet bn2: bn compared to the referent model
    :param size: size of the rendered graph

In [21]:
gnb.sideBySide(bn,bn2,gnb.getBNDiff(bn,bn2),
              captions=['target','learned BN','graphical diffs between target and learned'])
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? visit_to_Asia?->tuberculosis? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? lung_cancer? lung_cancer? lung_cancer?->tuberculos_or_cancer? smoking? smoking? smoking?->lung_cancer? bronchitis? bronchitis? smoking?->bronchitis? bronchitis?->dyspnoea?
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? visit_to_Asia?->tuberculosis? tuberculos_or_cancer? tuberculos_or_cancer? tuberculos_or_cancer?->visit_to_Asia? tuberculos_or_cancer?->tuberculosis? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? lung_cancer? lung_cancer? lung_cancer?->visit_to_Asia? lung_cancer?->tuberculosis? lung_cancer?->tuberculos_or_cancer? smoking? smoking? smoking?->lung_cancer? bronchitis? bronchitis? bronchitis?->tuberculos_or_cancer? bronchitis?->lung_cancer? bronchitis?->smoking? dyspnoea? dyspnoea? bronchitis?->dyspnoea? dyspnoea?->tuberculos_or_cancer? dyspnoea?->lung_cancer? dyspnoea?->smoking?
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? visit_to_Asia?->tuberculosis? tuberculos_or_cancer? tuberculos_or_cancer? tuberculos_or_cancer?->visit_to_Asia? tuberculos_or_cancer?->tuberculosis? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? lung_cancer? lung_cancer? lung_cancer?->visit_to_Asia? lung_cancer?->tuberculosis? lung_cancer?->tuberculos_or_cancer? smoking? smoking? smoking?->lung_cancer? bronchitis? bronchitis? bronchitis?->tuberculos_or_cancer? bronchitis?->lung_cancer? bronchitis?->smoking? dyspnoea? dyspnoea? bronchitis?->dyspnoea? dyspnoea?->tuberculos_or_cancer? dyspnoea?->lung_cancer? dyspnoea?->smoking?
target
learned BN
graphical diffs between target and learned
In [22]:
kl=gum.BruteForceKL(bn,bn2)
kl.compute()
** pyAgrum.BruteForceKL is deprecated in pyAgrum>0.12.6.
** A pyAgrum.ExactBNdistance has been created.
Out[22]:
{'klPQ': 6.548804072726818e-05,
 'errorPQ': 0,
 'klQP': 6.142727293341683e-05,
 'errorQP': 128,
 'hellinger': 0.0047943249921604345,
 'bhattacharya': 1.1487200177259424e-05,
 'jensen-shannon': 1.635404500887141e-05}

3. changing the scores

By default, a BDEU score is used. But it can be changed.

In [23]:
learner=gum.BNLearner(os.path.join("out","sample_asia.csv"),bn) #using bn as template for variables
learner.useGreedyHillClimbing()

# I know that smoking causes cancer
learner.addMandatoryArc(0,1)

# we prefere a log2likelihood score
learner.useScoreLog2Likelihood()

# in order to control the complexity, we limit the number of parents
learner.setMaxIndegree(1) # no more than 1 parent by node

bn2=learner.learnBN()
kl=gum.BruteForceKL(bn,bn2)
gnb.sideBySide(bn2,
               "<br/>".join(["<b>"+k+"</b> :"+str(v) for k,v in kl.compute().items()]),
               captions=["learned BN","distances"])
** pyAgrum.BruteForceKL is deprecated in pyAgrum>0.12.6.
** A pyAgrum.ExactBNdistance has been created.
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? visit_to_Asia?->tuberculosis? tuberculos_or_cancer? tuberculos_or_cancer? tuberculos_or_cancer?->visit_to_Asia? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? lung_cancer? lung_cancer? lung_cancer?->tuberculos_or_cancer? smoking? smoking? bronchitis? bronchitis? bronchitis?->smoking? dyspnoea?->bronchitis?
klPQ :0.12242972352392655
errorPQ :0
klQP :0.03316328084622611
errorQP :64
hellinger :0.20450308362194394
bhattacharya :0.02113247611612723
jensen-shannon :0.024089043332327004
learned BN
distances

4. Mixing algorithms

First we learn a structure with HillClimbing (faster ?)

In [24]:
learner=gum.BNLearner(os.path.join("out","sample_asia.csv"),bn) #using bn as template for variables
learner.useGreedyHillClimbing()
learner.addMandatoryArc(0,1)
bn2=learner.learnBN()
kl=gum.BruteForceKL(bn,bn2)
gnb.sideBySide(bn2,
               "<br/>".join(["<b>"+k+"</b> :"+str(v) for k,v in kl.compute().items()]),
               captions=["learned BN","distances"])
** pyAgrum.BruteForceKL is deprecated in pyAgrum>0.12.6.
** A pyAgrum.ExactBNdistance has been created.
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? visit_to_Asia?->tuberculosis? tuberculos_or_cancer? tuberculos_or_cancer? visit_to_Asia?->tuberculos_or_cancer? lung_cancer? lung_cancer? tuberculosis?->lung_cancer? tuberculos_or_cancer?->tuberculosis? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? tuberculos_or_cancer?->lung_cancer? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? smoking? smoking? lung_cancer?->smoking? bronchitis? bronchitis? smoking?->bronchitis? bronchitis?->dyspnoea?
klPQ :4.176823638104871e-05
errorPQ :0
klQP :3.8362810846541504e-05
errorQP :128
hellinger :0.0038578346940183975
bhattacharya :7.4358300433085875e-06
jensen-shannon :1.0512397920990794e-05
learned BN
distances

And then we refine with tabuList

In [25]:
learner=gum.BNLearner(os.path.join("out","sample_asia.csv"),bn) #using bn as template for variables
learner.useLocalSearchWithTabuList()

learner.setInitialDAG(bn2.dag())
#learner.setMaxNbDecreasingChanges(2)

bn3=learner.learnBN()
kl=gum.BruteForceKL(bn,bn3)
gnb.sideBySide(bn3,
               "<br/>".join(["<b>"+k+"</b> :"+str(v) for k,v in kl.compute().items()]),
               captions=["learned BN","distances"])
** pyAgrum.BruteForceKL is deprecated in pyAgrum>0.12.6.
** A pyAgrum.ExactBNdistance has been created.
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? tuberculosis?->visit_to_Asia? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? lung_cancer? lung_cancer? lung_cancer?->tuberculos_or_cancer? smoking? smoking? smoking?->lung_cancer? bronchitis? bronchitis? bronchitis?->smoking? bronchitis?->dyspnoea?
klPQ :2.8105844592948045e-05
errorPQ :0
klQP :2.4723629244956745e-05
errorQP :128
hellinger :0.003191666394945586
bhattacharya :5.087738265223894e-06
jensen-shannon :7.119081573257654e-06
learned BN
distances

Impact of the size of the database for the learning

In [26]:
!head out/sample_asia.csv
dyspnoea?,positive_XraY?,tuberculos_or_cancer?,visit_to_Asia?,tuberculosis?,smoking?,lung_cancer?,bronchitis?
1,1,1,1,1,1,1,1
1,1,1,1,1,1,1,1
0,1,1,1,1,0,1,0
0,1,1,1,1,0,1,1
0,1,1,1,1,0,1,0
1,0,1,1,1,1,1,0
0,0,1,1,1,0,1,0
1,1,1,1,1,0,1,0
0,1,1,1,1,0,1,0
In [27]:
import IPython.display
rows=3
sizes=[400,500,700,1000,2000,5000,
       10000,50000,75000,
       100000,150000,175000,
       200000,300000,500000]
In [28]:
res="<table>"
nbr=0
l=[]
for i in sizes:
    n=i+1
    oslike.rm(os.path.join("out",'extract_asia.csv'))
    oslike.head(os.path.join("out","sample_asia.csv"),n,os.path.join("out","extract_asia.csv"))
    oslike.wc_l(os.path.join("out","extract_asia.csv"))
    learner=gum.BNLearner(os.path.join("out","extract_asia.csv"),bn) # using bn as template for variables
    learner.useGreedyHillClimbing()
    bn2=learner.learnBN()
    
    kl=gum.ExactBNdistance(bn,bn2)
    r=kl.compute()
    l.append(r['klPQ'])
    
    if nbr % rows == 0:
        res+="<tr>"
    res+="<td><center>size="+str(i)+"</center>"+gnb.getBN(bn2,size="3")+"</td>"
    nbr+=1
    if nbr % rows == 0:
        res+="</tr>"
if nbr % rows!=0:
    res+="</tr>"
res+="</table>"

IPython.display.display(IPython.display.HTML(res))

plot(sizes,l)
print(l[-1])
401
501
701
1001
2001
5001
10001
50001
75001
100001
150001
175001
200001
300001
500000
size=400
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? lung_cancer? lung_cancer? lung_cancer?->tuberculos_or_cancer? smoking? smoking? bronchitis? bronchitis? bronchitis?->smoking? dyspnoea?->bronchitis?
size=500
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? bronchitis? bronchitis? tuberculos_or_cancer?->bronchitis? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? lung_cancer? lung_cancer? lung_cancer?->tuberculos_or_cancer? smoking? smoking? lung_cancer?->smoking? bronchitis?->smoking? dyspnoea?->bronchitis?
size=700
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? bronchitis? bronchitis? tuberculos_or_cancer?->bronchitis? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? lung_cancer? lung_cancer? lung_cancer?->visit_to_Asia? lung_cancer?->tuberculos_or_cancer? smoking? smoking? lung_cancer?->smoking? smoking?->visit_to_Asia? bronchitis?->smoking? dyspnoea?->bronchitis?
size=1000
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? tuberculos_or_cancer? tuberculos_or_cancer? tuberculos_or_cancer?->tuberculosis? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? lung_cancer? lung_cancer? tuberculos_or_cancer?->lung_cancer? bronchitis? bronchitis? tuberculos_or_cancer?->bronchitis? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? lung_cancer?->tuberculosis? smoking? smoking? lung_cancer?->smoking? bronchitis?->smoking? dyspnoea?->bronchitis?
size=2000
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? tuberculos_or_cancer? tuberculos_or_cancer? tuberculos_or_cancer?->tuberculosis? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? lung_cancer? lung_cancer? tuberculos_or_cancer?->lung_cancer? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? lung_cancer?->tuberculosis? smoking? smoking? lung_cancer?->smoking? bronchitis? bronchitis? smoking?->bronchitis? bronchitis?->dyspnoea?
size=5000
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? bronchitis? bronchitis? tuberculos_or_cancer?->bronchitis? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? lung_cancer? lung_cancer? lung_cancer?->tuberculos_or_cancer? smoking? smoking? lung_cancer?->smoking? bronchitis?->smoking? dyspnoea?->bronchitis?
size=10000
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? tuberculosis?->visit_to_Asia? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? lung_cancer? lung_cancer? lung_cancer?->tuberculos_or_cancer? smoking? smoking? smoking?->lung_cancer? bronchitis? bronchitis? smoking?->bronchitis? bronchitis?->dyspnoea?
size=50000
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? tuberculosis?->visit_to_Asia? tuberculos_or_cancer? tuberculos_or_cancer? tuberculos_or_cancer?->tuberculosis? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? lung_cancer? lung_cancer? tuberculos_or_cancer?->lung_cancer? bronchitis? bronchitis? tuberculos_or_cancer?->bronchitis? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? lung_cancer?->tuberculosis? smoking? smoking? lung_cancer?->smoking? bronchitis?->lung_cancer? bronchitis?->smoking? dyspnoea?->bronchitis?
size=75000
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? tuberculosis?->visit_to_Asia? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? lung_cancer? lung_cancer? lung_cancer?->tuberculos_or_cancer? smoking? smoking? smoking?->lung_cancer? bronchitis? bronchitis? bronchitis?->smoking? bronchitis?->dyspnoea?
size=100000
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? tuberculosis?->visit_to_Asia? tuberculos_or_cancer? tuberculos_or_cancer? tuberculos_or_cancer?->tuberculosis? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? lung_cancer? lung_cancer? tuberculos_or_cancer?->lung_cancer? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? lung_cancer?->tuberculosis? smoking? smoking? lung_cancer?->smoking? bronchitis? bronchitis? lung_cancer?->bronchitis? bronchitis?->smoking? bronchitis?->dyspnoea?
size=150000
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? tuberculosis?->visit_to_Asia? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? lung_cancer? lung_cancer? lung_cancer?->tuberculos_or_cancer? smoking? smoking? lung_cancer?->smoking? bronchitis? bronchitis? bronchitis?->lung_cancer? bronchitis?->smoking? bronchitis?->dyspnoea?
size=175000
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? visit_to_Asia?->tuberculosis? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? lung_cancer? lung_cancer? lung_cancer?->tuberculos_or_cancer? smoking? smoking? lung_cancer?->smoking? bronchitis? bronchitis? lung_cancer?->bronchitis? bronchitis?->smoking? bronchitis?->dyspnoea?
size=200000
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? visit_to_Asia?->tuberculosis? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? lung_cancer? lung_cancer? lung_cancer?->tuberculos_or_cancer? smoking? smoking? smoking?->lung_cancer? bronchitis? bronchitis? smoking?->bronchitis? bronchitis?->dyspnoea?
size=300000
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? tuberculosis?->visit_to_Asia? tuberculos_or_cancer? tuberculos_or_cancer? tuberculos_or_cancer?->tuberculosis? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? lung_cancer? lung_cancer? tuberculos_or_cancer?->lung_cancer? bronchitis? bronchitis? tuberculos_or_cancer?->bronchitis? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? lung_cancer?->tuberculosis? smoking? smoking? lung_cancer?->smoking? bronchitis?->lung_cancer? bronchitis?->smoking? dyspnoea?->bronchitis?
size=500000
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? tuberculosis?->visit_to_Asia? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? lung_cancer? lung_cancer? tuberculosis?->lung_cancer? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? tuberculos_or_cancer?->lung_cancer? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? smoking? smoking? lung_cancer?->smoking? bronchitis? bronchitis? smoking?->bronchitis? bronchitis?->dyspnoea?
3.525227924387601e-05
In [29]:
res="<table>"
nbr=0
l=[]
for i in sizes:
    n=i+1
    oslike.rm(os.path.join("out","extract_asia.csv"))
    oslike.head(os.path.join("out","sample_asia.csv"),n,os.path.join("out","extract_asia.csv"))
    oslike.wc_l(os.path.join("out","extract_asia.csv"))
    learner=gum.BNLearner(os.path.join("out","extract_asia.csv"),bn) #using bn as template for variables
    learner.useLocalSearchWithTabuList()
    bn2=learner.learnBN()
    
    kl=gum.ExactBNdistance(bn,bn2)
    r=kl.compute()
    l.append(r['klPQ'])
    
    bn2.setProperty("name","BN(%{0})".format(i))
    if nbr % rows == 0:
        res+="<tr>"
    res+="<td><center>size="+str(i)+"</center>"+gnb.getBN(bn2,size="3")+"</td>"
    nbr+=1
    if nbr % rows == 0:
        res+="</tr>"
if nbr % rows!=0:
    res+="</tr>"
res+="</table>"

IPython.display.display(IPython.display.HTML(res))

plot(sizes,l)
print(l[-1])
401
501
701
1001
2001
5001
10001
50001
75001
100001
150001
175001
200001
300001
500000
size=400
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? positive_XraY? positive_XraY? tuberculosis?->positive_XraY? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? lung_cancer? lung_cancer? lung_cancer?->tuberculos_or_cancer? lung_cancer?->positive_XraY? smoking? smoking? bronchitis? bronchitis? bronchitis?->smoking? dyspnoea?->bronchitis?
size=500
G visit_to_Asia? visit_to_Asia? lung_cancer? lung_cancer? visit_to_Asia?->lung_cancer? tuberculosis? tuberculosis? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? positive_XraY? positive_XraY? tuberculosis?->positive_XraY? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? lung_cancer?->tuberculos_or_cancer? lung_cancer?->positive_XraY? smoking? smoking? smoking?->lung_cancer? bronchitis? bronchitis? smoking?->bronchitis? bronchitis?->dyspnoea?
size=700
G visit_to_Asia? visit_to_Asia? smoking? smoking? visit_to_Asia?->smoking? tuberculosis? tuberculosis? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? positive_XraY? positive_XraY? tuberculosis?->positive_XraY? lung_cancer? lung_cancer? tuberculosis?->lung_cancer? tuberculos_or_cancer?->lung_cancer? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? positive_XraY?->tuberculos_or_cancer? lung_cancer?->smoking? bronchitis? bronchitis? smoking?->bronchitis? bronchitis?->dyspnoea?
size=1000
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? lung_cancer? lung_cancer? tuberculosis?->lung_cancer? tuberculos_or_cancer? tuberculos_or_cancer? tuberculos_or_cancer?->tuberculosis? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? tuberculos_or_cancer?->lung_cancer? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? smoking? smoking? lung_cancer?->smoking? bronchitis? bronchitis? smoking?->bronchitis? bronchitis?->dyspnoea?
size=2000
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? positive_XraY? positive_XraY? tuberculosis?->positive_XraY? lung_cancer? lung_cancer? tuberculosis?->lung_cancer? tuberculos_or_cancer?->lung_cancer? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? positive_XraY?->tuberculos_or_cancer? smoking? smoking? lung_cancer?->smoking? bronchitis? bronchitis? smoking?->bronchitis? bronchitis?->dyspnoea?
size=5000
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? positive_XraY? positive_XraY? tuberculosis?->positive_XraY? lung_cancer? lung_cancer? tuberculosis?->lung_cancer? tuberculos_or_cancer?->positive_XraY? tuberculos_or_cancer?->lung_cancer? bronchitis? bronchitis? tuberculos_or_cancer?->bronchitis? smoking? smoking? smoking?->tuberculosis? smoking?->tuberculos_or_cancer? smoking?->bronchitis? dyspnoea? dyspnoea? dyspnoea?->tuberculosis? dyspnoea?->tuberculos_or_cancer? dyspnoea?->smoking? dyspnoea?->bronchitis?
size=10000
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? visit_to_Asia?->tuberculosis? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? positive_XraY? positive_XraY? tuberculosis?->positive_XraY? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? lung_cancer? lung_cancer? lung_cancer?->tuberculos_or_cancer? lung_cancer?->positive_XraY? smoking? smoking? smoking?->lung_cancer? bronchitis? bronchitis? smoking?->bronchitis? smoking?->dyspnoea? bronchitis?->dyspnoea?
size=50000
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? tuberculosis?->visit_to_Asia? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? positive_XraY? positive_XraY? tuberculosis?->positive_XraY? lung_cancer? lung_cancer? tuberculosis?->lung_cancer? dyspnoea? dyspnoea? tuberculosis?->dyspnoea? tuberculos_or_cancer?->positive_XraY? tuberculos_or_cancer?->lung_cancer? bronchitis? bronchitis? tuberculos_or_cancer?->bronchitis? smoking? smoking? smoking?->tuberculos_or_cancer? smoking?->positive_XraY? smoking?->lung_cancer? smoking?->bronchitis? smoking?->dyspnoea? dyspnoea?->tuberculos_or_cancer? dyspnoea?->bronchitis?
size=75000
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? visit_to_Asia?->tuberculosis? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? positive_XraY? positive_XraY? tuberculosis?->positive_XraY? lung_cancer? lung_cancer? tuberculosis?->lung_cancer? tuberculos_or_cancer?->positive_XraY? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? lung_cancer?->tuberculos_or_cancer? smoking? smoking? lung_cancer?->smoking? bronchitis? bronchitis? smoking?->bronchitis? smoking?->dyspnoea? bronchitis?->dyspnoea?
size=100000
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? tuberculosis?->visit_to_Asia? lung_cancer? lung_cancer? tuberculosis?->lung_cancer? tuberculos_or_cancer? tuberculos_or_cancer? tuberculos_or_cancer?->tuberculosis? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? tuberculos_or_cancer?->lung_cancer? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? smoking? smoking? lung_cancer?->smoking? bronchitis? bronchitis? smoking?->bronchitis? bronchitis?->dyspnoea?
size=150000
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? tuberculosis?->visit_to_Asia? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? positive_XraY? positive_XraY? tuberculosis?->positive_XraY? tuberculos_or_cancer?->positive_XraY? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? lung_cancer? lung_cancer? lung_cancer?->tuberculos_or_cancer? smoking? smoking? smoking?->tuberculos_or_cancer? smoking?->positive_XraY? smoking?->lung_cancer? bronchitis? bronchitis? smoking?->bronchitis? smoking?->dyspnoea? bronchitis?->dyspnoea?
size=175000
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? visit_to_Asia?->tuberculosis? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? positive_XraY? positive_XraY? tuberculosis?->positive_XraY? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? lung_cancer? lung_cancer? lung_cancer?->tuberculos_or_cancer? lung_cancer?->positive_XraY? smoking? smoking? lung_cancer?->smoking? bronchitis? bronchitis? smoking?->bronchitis? smoking?->dyspnoea? bronchitis?->dyspnoea?
size=200000
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? visit_to_Asia?->tuberculosis? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? positive_XraY? positive_XraY? tuberculosis?->positive_XraY? lung_cancer? lung_cancer? tuberculosis?->lung_cancer? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? lung_cancer?->tuberculos_or_cancer? lung_cancer?->positive_XraY? smoking? smoking? lung_cancer?->smoking? bronchitis? bronchitis? smoking?->bronchitis? smoking?->dyspnoea? bronchitis?->dyspnoea?
size=300000
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? tuberculosis?->visit_to_Asia? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? positive_XraY? positive_XraY? tuberculosis?->positive_XraY? lung_cancer? lung_cancer? tuberculosis?->lung_cancer? tuberculos_or_cancer?->positive_XraY? tuberculos_or_cancer?->lung_cancer? bronchitis? bronchitis? tuberculos_or_cancer?->bronchitis? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? smoking? smoking? smoking?->tuberculos_or_cancer? smoking?->positive_XraY? smoking?->lung_cancer? smoking?->bronchitis? smoking?->dyspnoea? bronchitis?->dyspnoea?
size=500000
G visit_to_Asia? visit_to_Asia? tuberculosis? tuberculosis? tuberculosis?->visit_to_Asia? tuberculos_or_cancer? tuberculos_or_cancer? tuberculosis?->tuberculos_or_cancer? lung_cancer? lung_cancer? tuberculosis?->lung_cancer? positive_XraY? positive_XraY? tuberculos_or_cancer?->positive_XraY? tuberculos_or_cancer?->lung_cancer? dyspnoea? dyspnoea? tuberculos_or_cancer?->dyspnoea? smoking? smoking? lung_cancer?->smoking? bronchitis? bronchitis? smoking?->bronchitis? bronchitis?->dyspnoea?
3.525227924387601e-05
In [ ]: