Click here to hide/show the list of notebooks.
  pyAgrum on notebooks   pyAgrum jupyter
☰  structuralLearning 
pyAgrum 0.18.0   
Zipped notebooks   
generation: 2020-06-11 14:09  

Creative Commons License
This pyAgrum's notebook is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

In [1]:
%matplotlib inline
from pylab import *
import matplotlib.pyplot as plt

import os
In [2]:
import pyAgrum as gum
import pyAgrum.lib.notebook as gnb
gum.about()
gnb.configuration()
pyAgrum version 0.17.3.9
(c) 2015-2020 Pierre-Henri Wuillemin, Christophe Gonzales, Lionel Torti

    This is free software; see the source code for copying conditions.
    There is ABSOLUTELY NO WARRANTY; not even for MERCHANTABILITY or
    FITNESS FOR A PARTICULAR PURPOSE.  For details, see 'pyAgrum.warranty'.
    
LibraryVersion
OSposix [linux]
Python3.8.3 (default, May 17 2020, 18:15:42) [GCC 10.1.0]
IPython7.15.0
MatPlotLib3.2.1
Numpy1.18.5
pyAgrum0.17.3.9
Wed Jun 10 12:51:24 2020 CEST

Generating the database from a BN

In [3]:
bn=gum.loadBN(os.path.join("res","asia.bif"))
bn
Out[3]:
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis visit_to_Asia->tuberculosis tuberculos_or_cancer tuberculos_or_cancer tuberculosis->tuberculos_or_cancer positive_XraY positive_XraY tuberculos_or_cancer->positive_XraY dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea lung_cancer lung_cancer lung_cancer->tuberculos_or_cancer smoking smoking smoking->lung_cancer bronchitis bronchitis smoking->bronchitis bronchitis->dyspnoea
In [4]:
gum.generateCSV(bn,os.path.join("out","sample_asia.csv"),500000,True)
 out/sample_asia.csv : [ ##################################################### ] 100%ut/sample_asia.csv : [ #################                                     ] 32%out/sample_asia.csv : [ ################################                      ] 61%
Log2-Likelihood : -1614425.6006332487
Out[4]:
-1614425.6006332487
In [5]:
import pyAgrum.lib._utils.oslike as oslike
print("===\n  Size of the generated database\n===")
oslike.wc_l(os.path.join("out","sample_asia.csv"))
print("\n===\n  First lines\n===")
oslike.head(os.path.join("out","sample_asia.csv"))
===
  Size of the generated database
===
500000

===
  First lines
===
dyspnoea,tuberculos_or_cancer,visit_to_Asia,lung_cancer,tuberculosis,positive_XraY,bronchitis,smoking
1,1,1,1,1,1,1,1
1,1,1,1,1,1,0,1
0,1,1,1,1,1,1,1
0,1,1,1,1,1,1,0
0,1,1,1,1,0,0,0
0,1,1,1,1,1,0,1
1,1,1,1,1,1,1,1
1,1,1,1,1,1,1,1
1,1,1,1,1,1,1,1

In [6]:
learner=gum.BNLearner(os.path.join("out","sample_asia.csv"),bn) #using bn as template for variables
learner.names()
Out[6]:
('visit_to_Asia',
 'tuberculosis',
 'tuberculos_or_cancer',
 'positive_XraY',
 'lung_cancer',
 'smoking',
 'bronchitis',
 'dyspnoea')
In [7]:
learner.idFromName('visit_to_Asia') # first row is 0
Out[7]:
0
In [8]:
learner.nameFromId(4)
Out[8]:
'lung_cancer'

The BNLearner is capable of recognizing missing values in databases. For this purpose, just indicate as a last argument the list of the strings that represent missing values. Note that, currently, the BNLearner is not yet able to learn in the presence of missing values. This is the reason why, when it discovers that there exist such values, it raises a gum.MissingValueInDatabase exception.

In [9]:
# it is possible to add as a last argument a list of the symbols that represent missing values:
# whenever a cell of the database is equal to one of these strings, it is considered as a 
# missing value
learner=gum.BNLearner(os.path.join("out","sample_asia.csv"),bn, ['?', 'N/A'] )
In [10]:
oslike.head(os.path.join("res","asia_missing.csv"))

try:
    learner=gum.BNLearner(os.path.join("res","asia_missing.csv"),bn, ['?', 'N/A'] )
except gum.MissingValueInDatabase:
    print ( "exception raised: there are missing values in the database" )
smoking,lung_cancer,bronchitis,visit_to_Asia,tuberculosis,tuberculos_or_cancer,dyspnoea,positive_XraY
0,0,0,1,1,0,0,0
1,1,0,1,1,1,0,1
1,1,1,1,1,1,1,1
1,1,0,1,1,1,0,N/A
0,1,0,1,1,1,1,1
1,1,1,1,1,1,1,1
1,1,1,1,1,1,0,1
1,1,0,1,1,1,0,1
1,1,1,1,1,1,1,1

Parameters learning from the database

We give the $bn$ as a parameter for the learner in order to have the variables and the order of the labels for each variables. Please try to remove the argument $bn$ in the first line below to see the difference ...

In [11]:
learner=gum.BNLearner(os.path.join("out","sample_asia.csv"),bn) #using bn as template for variables and labels
learner.setInitialDAG(bn.dag())
bn2=learner.learnParameters()
gnb.showBN(bn2)
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis visit_to_Asia->tuberculosis tuberculos_or_cancer tuberculos_or_cancer tuberculosis->tuberculos_or_cancer positive_XraY positive_XraY tuberculos_or_cancer->positive_XraY dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea lung_cancer lung_cancer lung_cancer->tuberculos_or_cancer smoking smoking smoking->lung_cancer bronchitis bronchitis smoking->bronchitis bronchitis->dyspnoea
In [12]:
from IPython.display import HTML

HTML('<table><tr><td style="text-align:center;"><h3>original BN</h3></td>'+
     '<td style="text-align:center;"><h3>Learned BN</h3></td></tr>'+
     '<tr><td><center>'+
     gnb.getPotential(bn.cpt (bn.idFromName('visit_to_Asia')))
     +'</center></td><td><center>'+
     gnb.getPotential(bn2.cpt(bn2.idFromName('visit_to_Asia')))
     +'</center></td></tr><tr><td><center>'+
     gnb.getPotential(bn.cpt (bn.idFromName('tuberculosis')))
     +'</center></td><td><center>'+
     gnb.getPotential(bn2.cpt(bn2.idFromName('tuberculosis')))
     +'</center></td></tr></table>')
Out[12]:

original BN

Learned BN

visit_to_Asia
0
1
0.01000.9900
visit_to_Asia
0
1
0.01000.9900
tuberculosis
visit_to_Asia
0
1
0
0.05000.9500
1
0.01000.9900
tuberculosis
visit_to_Asia
0
1
0
0.05110.9489
1
0.01000.9900

Structural learning a BN from the database

Different learning algorithms

For now, there are three algorithms that are wrapped in pyAgrum : LocalSearchWithTabuList,

In [13]:
learner=gum.BNLearner(os.path.join("out","sample_asia.csv"),bn) #using bn as template for variables
learner.useLocalSearchWithTabuList()
bn2=learner.learnBN()
print("Learned in {0}ms".format(1000*learner.currentTime()))
gnb.sideBySide(bn2,gnb.getInformation(bn2))
kl=gum.ExactBNdistance(bn,bn2)
kl.compute()
Learned in 131.611533ms
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculosis->visit_to_Asia lung_cancer lung_cancer tuberculosis->lung_cancer tuberculos_or_cancer tuberculos_or_cancer tuberculos_or_cancer->tuberculosis positive_XraY positive_XraY tuberculos_or_cancer->positive_XraY tuberculos_or_cancer->lung_cancer dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea smoking smoking lung_cancer->smoking bronchitis bronchitis smoking->bronchitis bronchitis->dyspnoea
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculosis->visit_to_Asia lung_cancer lung_cancer tuberculosis->lung_cancer tuberculos_or_cancer tuberculos_or_cancer tuberculos_or_cancer->tuberculosis positive_XraY positive_XraY tuberculos_or_cancer->positive_XraY tuberculos_or_cancer->lung_cancer dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea smoking smoking lung_cancer->smoking bronchitis bronchitis smoking->bronchitis bronchitis->dyspnoea
0.08091893460110310.9999975365316494
Out[13]:
{'klPQ': 1.4970450248466766e-05,
 'errorPQ': 0,
 'klQP': 1.2070605168269539e-05,
 'errorQP': 128,
 'hellinger': 0.002384500922608345,
 'bhattacharya': 2.8372844843325112e-06,
 'jensen-shannon': 3.880114797276931e-06}

A greedy Hill Climbing algorithm (with insert, remove and change arc as atomic operations).

In [14]:
learner=gum.BNLearner(os.path.join("out","sample_asia.csv"),bn) #using bn as template for variables
learner.useGreedyHillClimbing()
bn2=learner.learnBN()
print("Learned in {0}ms".format(1000*learner.currentTime()))
gnb.sideBySide(bn2,gnb.getInformation(bn2))
Learned in 117.800136ms
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculosis->visit_to_Asia tuberculos_or_cancer tuberculos_or_cancer tuberculosis->tuberculos_or_cancer positive_XraY positive_XraY tuberculos_or_cancer->positive_XraY dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea lung_cancer lung_cancer lung_cancer->tuberculos_or_cancer smoking smoking lung_cancer->smoking bronchitis bronchitis smoking->bronchitis bronchitis->dyspnoea
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculosis->visit_to_Asia tuberculos_or_cancer tuberculos_or_cancer tuberculosis->tuberculos_or_cancer positive_XraY positive_XraY tuberculos_or_cancer->positive_XraY dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea lung_cancer lung_cancer lung_cancer->tuberculos_or_cancer smoking smoking lung_cancer->smoking bronchitis bronchitis smoking->bronchitis bronchitis->dyspnoea
0.08091893460110310.9999975365316491

And a K2 for those who likes it :)

In [15]:
learner=gum.BNLearner(os.path.join("out","sample_asia.csv"),bn) #using bn as template for variables
learner.useK2([0,1,2,3,4,5,6,7])
bn2=learner.learnBN()
print("Learned in {0}ms".format(1000*learner.currentTime()))
bn2
Learned in 15.164523ms
Out[15]:
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis visit_to_Asia->tuberculosis tuberculos_or_cancer tuberculos_or_cancer tuberculosis->tuberculos_or_cancer lung_cancer lung_cancer tuberculosis->lung_cancer positive_XraY positive_XraY tuberculos_or_cancer->positive_XraY tuberculos_or_cancer->lung_cancer dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea smoking smoking lung_cancer->smoking bronchitis bronchitis smoking->bronchitis bronchitis->dyspnoea

K2 can be very good if the order is the good one (a topological order of nodes in the reference)

In [16]:
learner=gum.BNLearner(os.path.join("out","sample_asia.csv"),bn) #using bn as template for variables
learner.useK2([7,6,5,4,3,2,1,0])
bn2=learner.learnBN()
print("Learned in {0}s".format(learner.currentTime()))
bn2
Learned in 0.018981097s
Out[16]:
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculosis->visit_to_Asia tuberculos_or_cancer tuberculos_or_cancer tuberculos_or_cancer->tuberculosis positive_XraY positive_XraY positive_XraY->tuberculos_or_cancer lung_cancer lung_cancer lung_cancer->tuberculosis lung_cancer->tuberculos_or_cancer lung_cancer->positive_XraY smoking smoking smoking->lung_cancer bronchitis bronchitis bronchitis->tuberculos_or_cancer bronchitis->positive_XraY bronchitis->lung_cancer bronchitis->smoking dyspnoea dyspnoea dyspnoea->tuberculos_or_cancer dyspnoea->positive_XraY dyspnoea->lung_cancer dyspnoea->smoking dyspnoea->bronchitis

Following the learning curve

In [17]:
import numpy as np
%matplotlib inline

learner=gum.BNLearner(os.path.join("out","sample_asia.csv"),bn) #using bn as template for variables
learner.useLocalSearchWithTabuList()

# we could prefere a log2likelihood score
# learner.useScoreLog2Likelihood()
learner.setMaxTime(10)

# representation of the error as a pseudo log (negative values really represents negative epsilon
@np.vectorize
def pseudolog(x):
    seuil=2.0
    y=-x if x<0 else x
        
    if y<seuil:
        res=y*np.log10(seuil)/seuil
    else:
        res=np.log10(y)
        
    return res if x>0 else -res

# in order to control the complexity, we limit the number of parents
learner.setMaxIndegree(3) # no more than 3 parent by node
gnb.animApproximationScheme(learner,
                            scale=pseudolog) # scale by default is np.log10

bn2=learner.learnBN()

Customizing the learning algorithms

1. Learn a tree ?

In [18]:
learner=gum.BNLearner(os.path.join("out","sample_asia.csv"),bn) #using bn as template for variables
learner.useGreedyHillClimbing()

learner.setMaxIndegree(1) # no more than 1 parent by node

bntree=learner.learnBN()
bntree
Out[18]:
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculosis->visit_to_Asia tuberculos_or_cancer tuberculos_or_cancer tuberculos_or_cancer->tuberculosis positive_XraY positive_XraY tuberculos_or_cancer->positive_XraY lung_cancer lung_cancer lung_cancer->tuberculos_or_cancer bronchitis bronchitis lung_cancer->bronchitis smoking smoking bronchitis->smoking dyspnoea dyspnoea bronchitis->dyspnoea

2. with prior structural knowledge

In [19]:
learner=gum.BNLearner(os.path.join("out","sample_asia.csv"),bn) #using bn as template for variables
learner.useGreedyHillClimbing()

# I know that smoking causes cancer
learner.addMandatoryArc("smoking","lung_cancer") # smoking->lung_cancer
# I know that visit to Asia may change the risk of tuberculosis
learner.addMandatoryArc("visit_to_Asia","tuberculosis") # visit_to_Asia->tuberculosis

bn2=learner.learnBN()
gnb.showBN(bn2,size="5")
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis visit_to_Asia->tuberculosis tuberculos_or_cancer tuberculos_or_cancer tuberculos_or_cancer->visit_to_Asia tuberculos_or_cancer->tuberculosis positive_XraY positive_XraY tuberculos_or_cancer->positive_XraY lung_cancer lung_cancer lung_cancer->visit_to_Asia lung_cancer->tuberculosis lung_cancer->tuberculos_or_cancer smoking smoking smoking->lung_cancer dyspnoea dyspnoea smoking->dyspnoea bronchitis bronchitis bronchitis->tuberculos_or_cancer bronchitis->lung_cancer bronchitis->smoking bronchitis->dyspnoea dyspnoea->tuberculos_or_cancer dyspnoea->lung_cancer

3. comparing BNs

In [20]:
help(gnb.getBNDiff)
Help on function getBNDiff in module pyAgrum.lib.notebook:

getBNDiff(bn1, bn2, size=None)
    get a HTML string representation of a graphical diff between the arcs of _bn1 (reference) with those of _bn2.
    
    * full black line: the arc is common for both
    * full red line: the arc is common but inverted in _bn2
    * dotted black line: the arc is added in _bn2
    * dotted red line: the arc is removed in _bn2
    
    :param BayesNet bn1: referent model for the comparison
    :param BayesNet bn2: bn compared to the referent model
    :param size: size of the rendered graph

In [21]:
gnb.sideBySide(bn,bn2,gnb.getBNDiff(bn,bn2),
              captions=['target','learned BN','graphical diffs between target and learned'])
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis visit_to_Asia->tuberculosis tuberculos_or_cancer tuberculos_or_cancer tuberculosis->tuberculos_or_cancer positive_XraY positive_XraY tuberculos_or_cancer->positive_XraY dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea lung_cancer lung_cancer lung_cancer->tuberculos_or_cancer smoking smoking smoking->lung_cancer bronchitis bronchitis smoking->bronchitis bronchitis->dyspnoea
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis visit_to_Asia->tuberculosis tuberculos_or_cancer tuberculos_or_cancer tuberculos_or_cancer->visit_to_Asia tuberculos_or_cancer->tuberculosis positive_XraY positive_XraY tuberculos_or_cancer->positive_XraY lung_cancer lung_cancer lung_cancer->visit_to_Asia lung_cancer->tuberculosis lung_cancer->tuberculos_or_cancer smoking smoking smoking->lung_cancer dyspnoea dyspnoea smoking->dyspnoea bronchitis bronchitis bronchitis->tuberculos_or_cancer bronchitis->lung_cancer bronchitis->smoking bronchitis->dyspnoea dyspnoea->tuberculos_or_cancer dyspnoea->lung_cancer
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis visit_to_Asia->tuberculosis tuberculos_or_cancer tuberculos_or_cancer tuberculos_or_cancer->visit_to_Asia tuberculos_or_cancer->tuberculosis positive_XraY positive_XraY tuberculos_or_cancer->positive_XraY lung_cancer lung_cancer lung_cancer->visit_to_Asia lung_cancer->tuberculosis lung_cancer->tuberculos_or_cancer smoking smoking smoking->lung_cancer dyspnoea dyspnoea smoking->dyspnoea bronchitis bronchitis bronchitis->tuberculos_or_cancer bronchitis->lung_cancer bronchitis->smoking bronchitis->dyspnoea dyspnoea->tuberculos_or_cancer dyspnoea->lung_cancer
target
learned BN
graphical diffs between target and learned
In [22]:
kl=gum.BruteForceKL(bn,bn2)
kl.compute()
** pyAgrum.BruteForceKL is deprecated in pyAgrum>0.12.6.
** A pyAgrum.ExactBNdistance has been created.
Out[22]:
{'klPQ': 3.4690680755188096e-05,
 'errorPQ': 0,
 'klQP': 3.249126276660529e-05,
 'errorQP': 128,
 'hellinger': 0.003554671547024016,
 'bhattacharya': 6.312222959743506e-06,
 'jensen-shannon': 8.889049377310033e-06}

3. changing the scores

By default, a BDEU score is used. But it can be changed.

In [23]:
learner=gum.BNLearner(os.path.join("out","sample_asia.csv"),bn) #using bn as template for variables
learner.useGreedyHillClimbing()

# I know that smoking causes cancer
learner.addMandatoryArc(0,1)

# we prefere a log2likelihood score
learner.useScoreLog2Likelihood()

# in order to control the complexity, we limit the number of parents
learner.setMaxIndegree(1) # no more than 1 parent by node

bn2=learner.learnBN()
kl=gum.ExactBNdistance(bn,bn2)
gnb.sideBySide(bn2,
               "<br/>".join(["<b>"+k+"</b> :"+str(v) for k,v in kl.compute().items()]),
               captions=["learned BN","distances"])
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis visit_to_Asia->tuberculosis tuberculos_or_cancer tuberculos_or_cancer tuberculos_or_cancer->visit_to_Asia positive_XraY positive_XraY tuberculos_or_cancer->positive_XraY lung_cancer lung_cancer lung_cancer->tuberculos_or_cancer smoking smoking bronchitis bronchitis smoking->bronchitis bronchitis->lung_cancer dyspnoea dyspnoea bronchitis->dyspnoea
klPQ :0.14943183460342557
errorPQ :0
klQP :0.05604213522603974
errorQP :64
hellinger :0.22061537092194441
bhattacharya :0.024636568595054222
jensen-shannon :0.028334565421555793
learned BN
distances

4. Mixing algorithms

First we learn a structure with HillClimbing (faster ?)

In [24]:
learner=gum.BNLearner(os.path.join("out","sample_asia.csv"),bn) #using bn as template for variables
learner.useGreedyHillClimbing()
learner.addMandatoryArc(0,1)
bn2=learner.learnBN()
kl=gum.ExactBNdistance(bn,bn2)
gnb.sideBySide(bn2,
               "<br/>".join(["<b>"+k+"</b> :"+str(v) for k,v in kl.compute().items()]),
               captions=["learned BN","distances"])
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis visit_to_Asia->tuberculosis tuberculos_or_cancer tuberculos_or_cancer visit_to_Asia->tuberculos_or_cancer tuberculos_or_cancer->tuberculosis positive_XraY positive_XraY tuberculos_or_cancer->positive_XraY dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea lung_cancer lung_cancer lung_cancer->tuberculosis lung_cancer->tuberculos_or_cancer smoking smoking lung_cancer->smoking bronchitis bronchitis smoking->bronchitis bronchitis->dyspnoea
klPQ :1.8447778690843138e-05
errorPQ :0
klQP :1.5951846959955888e-05
errorQP :128
hellinger :0.0026350503505781315
bhattacharya :3.4661093159854726e-06
jensen-shannon :4.788404850233091e-06
learned BN
distances

And then we refine with tabuList

In [25]:
learner=gum.BNLearner(os.path.join("out","sample_asia.csv"),bn) #using bn as template for variables
learner.useLocalSearchWithTabuList()

learner.setInitialDAG(bn2.dag())
#learner.setMaxNbDecreasingChanges(2)

bn3=learner.learnBN()
kl=gum.ExactBNdistance(bn,bn3)
gnb.sideBySide(bn3,
               "<br/>".join(["<b>"+k+"</b> :"+str(v) for k,v in kl.compute().items()]),
               captions=["learned BN","distances"])
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis visit_to_Asia->tuberculosis tuberculos_or_cancer tuberculos_or_cancer tuberculosis->tuberculos_or_cancer positive_XraY positive_XraY tuberculos_or_cancer->positive_XraY dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea lung_cancer lung_cancer lung_cancer->tuberculos_or_cancer smoking smoking smoking->lung_cancer bronchitis bronchitis bronchitis->smoking bronchitis->dyspnoea
klPQ :1.491697021243515e-05
errorPQ :0
klQP :1.2009949390267899e-05
errorQP :128
hellinger :0.002380893275145871
bhattacharya :2.8286885284581813e-06
jensen-shannon :3.86714154596894e-06
learned BN
distances

Impact of the size of the database for the learning

In [26]:
!head out/sample_asia.csv
dyspnoea,tuberculos_or_cancer,visit_to_Asia,lung_cancer,tuberculosis,positive_XraY,bronchitis,smoking
1,1,1,1,1,1,1,1
1,1,1,1,1,1,0,1
0,1,1,1,1,1,1,1
0,1,1,1,1,1,1,0
0,1,1,1,1,0,0,0
0,1,1,1,1,1,0,1
1,1,1,1,1,1,1,1
1,1,1,1,1,1,1,1
1,1,1,1,1,1,1,1
In [27]:
import IPython.display
rows=3
sizes=[400,500,700,1000,2000,5000,
       10000,50000,75000,
       100000,150000,175000,
       200000,300000,500000]
In [28]:
res="<table>"
nbr=0
l=[]
for i in sizes:
    n=i+1
    oslike.rm(os.path.join("out",'extract_asia.csv'))
    oslike.head(os.path.join("out","sample_asia.csv"),n,os.path.join("out","extract_asia.csv"))
    oslike.wc_l(os.path.join("out","extract_asia.csv"))
    learner=gum.BNLearner(os.path.join("out","extract_asia.csv"),bn) # using bn as template for variables
    learner.useGreedyHillClimbing()
    bn2=learner.learnBN()
    
    kl=gum.ExactBNdistance(bn,bn2)
    r=kl.compute()
    l.append(r['klPQ'])
    
    if nbr % rows == 0:
        res+="<tr>"
    res+="<td><center>size="+str(i)+"</center>"+gnb.getBN(bn2,size="3")+"</td>"
    nbr+=1
    if nbr % rows == 0:
        res+="</tr>"
if nbr % rows!=0:
    res+="</tr>"
res+="</table>"

IPython.display.display(IPython.display.HTML(res))

plot(sizes,l)
print(l[-1])
401
501
701
1001
2001
5001
10001
50001
75001
100001
150001
175001
200001
300001
500000
size=400
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculos_or_cancer tuberculos_or_cancer tuberculosis->tuberculos_or_cancer positive_XraY positive_XraY tuberculos_or_cancer->positive_XraY dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea lung_cancer lung_cancer lung_cancer->tuberculos_or_cancer smoking smoking lung_cancer->smoking smoking->visit_to_Asia bronchitis bronchitis bronchitis->smoking dyspnoea->bronchitis
size=500
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculos_or_cancer tuberculos_or_cancer tuberculosis->tuberculos_or_cancer positive_XraY positive_XraY tuberculos_or_cancer->positive_XraY dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea lung_cancer lung_cancer lung_cancer->tuberculos_or_cancer smoking smoking lung_cancer->smoking smoking->visit_to_Asia bronchitis bronchitis smoking->bronchitis bronchitis->dyspnoea
size=700
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculos_or_cancer tuberculos_or_cancer tuberculos_or_cancer->tuberculosis lung_cancer lung_cancer tuberculos_or_cancer->lung_cancer bronchitis bronchitis tuberculos_or_cancer->bronchitis dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea positive_XraY positive_XraY positive_XraY->tuberculos_or_cancer lung_cancer->tuberculosis smoking smoking lung_cancer->smoking smoking->visit_to_Asia bronchitis->smoking dyspnoea->bronchitis
size=1000
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis lung_cancer lung_cancer tuberculosis->lung_cancer tuberculos_or_cancer tuberculos_or_cancer tuberculos_or_cancer->tuberculosis positive_XraY positive_XraY tuberculos_or_cancer->positive_XraY tuberculos_or_cancer->lung_cancer bronchitis bronchitis tuberculos_or_cancer->bronchitis dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea smoking smoking lung_cancer->smoking smoking->visit_to_Asia bronchitis->smoking dyspnoea->bronchitis
size=2000
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculos_or_cancer tuberculos_or_cancer tuberculos_or_cancer->tuberculosis positive_XraY positive_XraY tuberculos_or_cancer->positive_XraY lung_cancer lung_cancer tuberculos_or_cancer->lung_cancer dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea lung_cancer->tuberculosis smoking smoking lung_cancer->smoking bronchitis bronchitis lung_cancer->bronchitis smoking->visit_to_Asia bronchitis->smoking bronchitis->dyspnoea
size=5000
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculosis->visit_to_Asia tuberculos_or_cancer tuberculos_or_cancer tuberculosis->tuberculos_or_cancer positive_XraY positive_XraY tuberculos_or_cancer->positive_XraY bronchitis bronchitis tuberculos_or_cancer->bronchitis dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea lung_cancer lung_cancer lung_cancer->tuberculos_or_cancer smoking smoking lung_cancer->smoking bronchitis->smoking dyspnoea->bronchitis
size=10000
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculosis->visit_to_Asia tuberculos_or_cancer tuberculos_or_cancer tuberculosis->tuberculos_or_cancer positive_XraY positive_XraY tuberculos_or_cancer->positive_XraY bronchitis bronchitis tuberculos_or_cancer->bronchitis dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea lung_cancer lung_cancer lung_cancer->tuberculos_or_cancer smoking smoking lung_cancer->smoking bronchitis->smoking dyspnoea->bronchitis
size=50000
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculosis->visit_to_Asia tuberculos_or_cancer tuberculos_or_cancer tuberculosis->tuberculos_or_cancer positive_XraY positive_XraY tuberculos_or_cancer->positive_XraY dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea lung_cancer lung_cancer lung_cancer->tuberculos_or_cancer smoking smoking smoking->lung_cancer bronchitis bronchitis bronchitis->smoking bronchitis->dyspnoea
size=75000
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculosis->visit_to_Asia tuberculos_or_cancer tuberculos_or_cancer tuberculosis->tuberculos_or_cancer positive_XraY positive_XraY tuberculos_or_cancer->positive_XraY dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea lung_cancer lung_cancer lung_cancer->tuberculos_or_cancer smoking smoking lung_cancer->smoking bronchitis bronchitis lung_cancer->bronchitis bronchitis->smoking bronchitis->dyspnoea
size=100000
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculosis->visit_to_Asia tuberculos_or_cancer tuberculos_or_cancer tuberculosis->tuberculos_or_cancer positive_XraY positive_XraY tuberculos_or_cancer->positive_XraY dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea lung_cancer lung_cancer lung_cancer->tuberculos_or_cancer smoking smoking lung_cancer->smoking bronchitis bronchitis smoking->bronchitis bronchitis->dyspnoea
size=150000
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculosis->visit_to_Asia lung_cancer lung_cancer tuberculosis->lung_cancer tuberculos_or_cancer tuberculos_or_cancer tuberculos_or_cancer->tuberculosis tuberculos_or_cancer->lung_cancer bronchitis bronchitis tuberculos_or_cancer->bronchitis dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea positive_XraY positive_XraY positive_XraY->tuberculos_or_cancer smoking smoking lung_cancer->smoking bronchitis->tuberculosis bronchitis->smoking dyspnoea->bronchitis
size=175000
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculosis->visit_to_Asia tuberculos_or_cancer tuberculos_or_cancer tuberculos_or_cancer->tuberculosis positive_XraY positive_XraY tuberculos_or_cancer->positive_XraY lung_cancer lung_cancer tuberculos_or_cancer->lung_cancer dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea lung_cancer->tuberculosis smoking smoking lung_cancer->smoking bronchitis bronchitis lung_cancer->bronchitis bronchitis->smoking bronchitis->dyspnoea
size=200000
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculosis->visit_to_Asia tuberculos_or_cancer tuberculos_or_cancer tuberculosis->tuberculos_or_cancer positive_XraY positive_XraY tuberculos_or_cancer->positive_XraY dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea lung_cancer lung_cancer lung_cancer->tuberculos_or_cancer smoking smoking smoking->lung_cancer bronchitis bronchitis smoking->bronchitis bronchitis->dyspnoea
size=300000
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculosis->visit_to_Asia tuberculos_or_cancer tuberculos_or_cancer tuberculosis->tuberculos_or_cancer dyspnoea dyspnoea tuberculosis->dyspnoea positive_XraY positive_XraY tuberculos_or_cancer->positive_XraY bronchitis bronchitis tuberculos_or_cancer->bronchitis tuberculos_or_cancer->dyspnoea lung_cancer lung_cancer lung_cancer->tuberculos_or_cancer smoking smoking lung_cancer->smoking lung_cancer->bronchitis bronchitis->smoking dyspnoea->bronchitis
size=500000
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculosis->visit_to_Asia tuberculos_or_cancer tuberculos_or_cancer tuberculosis->tuberculos_or_cancer positive_XraY positive_XraY tuberculos_or_cancer->positive_XraY dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea lung_cancer lung_cancer lung_cancer->tuberculos_or_cancer smoking smoking lung_cancer->smoking bronchitis bronchitis smoking->bronchitis bronchitis->dyspnoea
1.4916970212424741e-05
In [29]:
res="<table>"
nbr=0
l=[]
for i in sizes:
    n=i+1
    oslike.rm(os.path.join("out","extract_asia.csv"))
    oslike.head(os.path.join("out","sample_asia.csv"),n,os.path.join("out","extract_asia.csv"))
    oslike.wc_l(os.path.join("out","extract_asia.csv"))
    learner=gum.BNLearner(os.path.join("out","extract_asia.csv"),bn) #using bn as template for variables
    learner.useLocalSearchWithTabuList()
    bn2=learner.learnBN()
    
    kl=gum.ExactBNdistance(bn,bn2)
    r=kl.compute()
    l.append(r['klPQ'])
    
    bn2.setProperty("name","BN(%{0})".format(i))
    if nbr % rows == 0:
        res+="<tr>"
    res+="<td><center>size="+str(i)+"</center>"+gnb.getBN(bn2,size="3")+"</td>"
    nbr+=1
    if nbr % rows == 0:
        res+="</tr>"
if nbr % rows!=0:
    res+="</tr>"
res+="</table>"

IPython.display.display(IPython.display.HTML(res))

plot(sizes,l)
print(l[-1])
401
501
701
1001
2001
5001
10001
50001
75001
100001
150001
175001
200001
300001
500000
size=400
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis lung_cancer lung_cancer tuberculosis->lung_cancer tuberculos_or_cancer tuberculos_or_cancer tuberculos_or_cancer->tuberculosis tuberculos_or_cancer->lung_cancer dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea positive_XraY positive_XraY positive_XraY->tuberculos_or_cancer smoking smoking lung_cancer->smoking smoking->visit_to_Asia bronchitis bronchitis smoking->bronchitis dyspnoea->smoking dyspnoea->bronchitis
size=500
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculos_or_cancer tuberculos_or_cancer tuberculosis->tuberculos_or_cancer positive_XraY positive_XraY tuberculosis->positive_XraY dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea lung_cancer lung_cancer lung_cancer->tuberculos_or_cancer lung_cancer->positive_XraY smoking smoking lung_cancer->smoking smoking->visit_to_Asia bronchitis bronchitis smoking->bronchitis smoking->dyspnoea bronchitis->dyspnoea
size=700
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis lung_cancer lung_cancer tuberculosis->lung_cancer tuberculos_or_cancer tuberculos_or_cancer tuberculos_or_cancer->tuberculosis tuberculos_or_cancer->lung_cancer dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea positive_XraY positive_XraY positive_XraY->tuberculos_or_cancer smoking smoking lung_cancer->smoking smoking->visit_to_Asia bronchitis bronchitis bronchitis->smoking bronchitis->dyspnoea
size=1000
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis lung_cancer lung_cancer tuberculosis->lung_cancer tuberculos_or_cancer tuberculos_or_cancer tuberculos_or_cancer->tuberculosis positive_XraY positive_XraY tuberculos_or_cancer->positive_XraY tuberculos_or_cancer->lung_cancer bronchitis bronchitis tuberculos_or_cancer->bronchitis dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea smoking smoking lung_cancer->smoking smoking->visit_to_Asia bronchitis->smoking dyspnoea->bronchitis
size=2000
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculos_or_cancer tuberculos_or_cancer tuberculosis->tuberculos_or_cancer positive_XraY positive_XraY tuberculosis->positive_XraY lung_cancer lung_cancer tuberculosis->lung_cancer tuberculos_or_cancer->lung_cancer dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea positive_XraY->tuberculos_or_cancer smoking smoking smoking->visit_to_Asia smoking->tuberculos_or_cancer smoking->positive_XraY bronchitis bronchitis bronchitis->smoking bronchitis->dyspnoea
size=5000
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculosis->visit_to_Asia tuberculos_or_cancer tuberculos_or_cancer tuberculosis->tuberculos_or_cancer positive_XraY positive_XraY tuberculosis->positive_XraY lung_cancer lung_cancer tuberculosis->lung_cancer tuberculos_or_cancer->lung_cancer bronchitis bronchitis tuberculos_or_cancer->bronchitis positive_XraY->tuberculos_or_cancer smoking smoking smoking->tuberculos_or_cancer smoking->positive_XraY smoking->bronchitis dyspnoea dyspnoea smoking->dyspnoea dyspnoea->tuberculosis dyspnoea->tuberculos_or_cancer dyspnoea->positive_XraY dyspnoea->bronchitis
size=10000
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculosis->visit_to_Asia tuberculos_or_cancer tuberculos_or_cancer tuberculosis->tuberculos_or_cancer positive_XraY positive_XraY tuberculosis->positive_XraY lung_cancer lung_cancer tuberculosis->lung_cancer dyspnoea dyspnoea tuberculosis->dyspnoea tuberculos_or_cancer->positive_XraY tuberculos_or_cancer->lung_cancer bronchitis bronchitis tuberculos_or_cancer->bronchitis smoking smoking smoking->tuberculos_or_cancer smoking->positive_XraY smoking->bronchitis dyspnoea->tuberculos_or_cancer dyspnoea->smoking dyspnoea->bronchitis
size=50000
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculosis->visit_to_Asia tuberculos_or_cancer tuberculos_or_cancer tuberculosis->tuberculos_or_cancer positive_XraY positive_XraY tuberculosis->positive_XraY lung_cancer lung_cancer tuberculosis->lung_cancer dyspnoea dyspnoea tuberculosis->dyspnoea tuberculos_or_cancer->positive_XraY tuberculos_or_cancer->lung_cancer bronchitis bronchitis tuberculos_or_cancer->bronchitis smoking smoking smoking->tuberculos_or_cancer smoking->positive_XraY smoking->lung_cancer smoking->bronchitis smoking->dyspnoea dyspnoea->tuberculos_or_cancer dyspnoea->bronchitis
size=75000
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculosis->visit_to_Asia tuberculos_or_cancer tuberculos_or_cancer tuberculosis->tuberculos_or_cancer positive_XraY positive_XraY tuberculosis->positive_XraY dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea lung_cancer lung_cancer lung_cancer->tuberculos_or_cancer lung_cancer->positive_XraY smoking smoking lung_cancer->smoking bronchitis bronchitis smoking->bronchitis smoking->dyspnoea bronchitis->dyspnoea
size=100000
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculosis->visit_to_Asia tuberculos_or_cancer tuberculos_or_cancer tuberculos_or_cancer->tuberculosis lung_cancer lung_cancer tuberculos_or_cancer->lung_cancer dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea positive_XraY positive_XraY positive_XraY->tuberculos_or_cancer lung_cancer->tuberculosis smoking smoking lung_cancer->smoking bronchitis bronchitis smoking->bronchitis bronchitis->dyspnoea
size=150000
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculosis->visit_to_Asia tuberculos_or_cancer tuberculos_or_cancer tuberculosis->tuberculos_or_cancer positive_XraY positive_XraY tuberculosis->positive_XraY lung_cancer lung_cancer tuberculosis->lung_cancer tuberculos_or_cancer->positive_XraY tuberculos_or_cancer->lung_cancer bronchitis bronchitis tuberculos_or_cancer->bronchitis dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea smoking smoking smoking->tuberculos_or_cancer smoking->positive_XraY smoking->lung_cancer smoking->bronchitis smoking->dyspnoea dyspnoea->bronchitis
size=175000
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculosis->visit_to_Asia lung_cancer lung_cancer tuberculosis->lung_cancer tuberculos_or_cancer tuberculos_or_cancer tuberculos_or_cancer->tuberculosis tuberculos_or_cancer->lung_cancer dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea positive_XraY positive_XraY positive_XraY->tuberculos_or_cancer smoking smoking lung_cancer->smoking bronchitis bronchitis smoking->bronchitis bronchitis->dyspnoea
size=200000
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculosis->visit_to_Asia tuberculos_or_cancer tuberculos_or_cancer tuberculosis->tuberculos_or_cancer positive_XraY positive_XraY tuberculosis->positive_XraY lung_cancer lung_cancer tuberculosis->lung_cancer tuberculos_or_cancer->positive_XraY tuberculos_or_cancer->lung_cancer bronchitis bronchitis tuberculos_or_cancer->bronchitis dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea smoking smoking smoking->tuberculos_or_cancer smoking->positive_XraY smoking->lung_cancer smoking->bronchitis smoking->dyspnoea dyspnoea->bronchitis
size=300000
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculosis->visit_to_Asia tuberculos_or_cancer tuberculos_or_cancer tuberculosis->tuberculos_or_cancer positive_XraY positive_XraY tuberculosis->positive_XraY lung_cancer lung_cancer tuberculosis->lung_cancer dyspnoea dyspnoea tuberculosis->dyspnoea tuberculos_or_cancer->positive_XraY tuberculos_or_cancer->lung_cancer bronchitis bronchitis tuberculos_or_cancer->bronchitis smoking smoking smoking->tuberculos_or_cancer smoking->positive_XraY smoking->lung_cancer smoking->bronchitis smoking->dyspnoea dyspnoea->tuberculos_or_cancer dyspnoea->bronchitis
size=500000
G visit_to_Asia visit_to_Asia tuberculosis tuberculosis tuberculosis->visit_to_Asia lung_cancer lung_cancer tuberculosis->lung_cancer tuberculos_or_cancer tuberculos_or_cancer tuberculos_or_cancer->tuberculosis positive_XraY positive_XraY tuberculos_or_cancer->positive_XraY tuberculos_or_cancer->lung_cancer dyspnoea dyspnoea tuberculos_or_cancer->dyspnoea smoking smoking lung_cancer->smoking bronchitis bronchitis smoking->bronchitis bronchitis->dyspnoea
1.4970450248466766e-05
In [ ]: