Logo
pyAgrum 1.8.1 on Jupyter

Tutorials

  • ▶ Tutorial
    • ▷ Tutorial
    • ▷ Tutorial2
  • ▶ Examples
    • ▷ Asthma
    • ▷ Kaggle Titanic
    • ▷ Naive Credit Default Modeling
    • ▷ Causality And Learning
    • ▷ Sensitivity Analysis Using Credal Networks
    • ▷ Quasi Continuous
    • ▷ Parameters Learning With Pandas
    • ▷ Bayesian Beta Coin
  • ▶ Models
    • ▷ Influence Diagram
    • ▷ Dynamic Bn
    • ▷ Markov Random Field
    • ▷ Credal Networks
    • ▷ O3PRM
  • ▶ Learning
    • ▷ Structural Learning
    • ▷ Learning Classifier
    • ▷ Learning And Essential Graphs
    • ▷ Dirichlet Prior And Weigthed Database
    • ▷ Parametric Em
    • ▷ Chi2 And Scores From Bn Learner
  • ▶ Inference
    • ▷ Graphical Inference
    • ▷ Relevance Reasoning
    • ▷ Lazy Propagation Advanced Features
    • ▷ Approximate Inference
    • ▷ Sampling Inference
  • ▶ Classifier
    • ▷ Learning
    • ▷ Discretizer
    • ▷ Compare Classifiers With Sklearn
    • ▷ Cross Validation
    • ▷ Binary And Nary Classifier From Bn
  • ▶ Causality
    • ▷ Tobacco
    • ▷ Simpson Paradox
    • ▷ Multinomial Simpson Paradox
    • ▷ Do Calculus Examples
    • ▷ Counterfactual
  • ▶ Applications
    • ▷ Ipywidgets
  • ▶ Tools
    • ▷ Potentials
    • ▷ Aggregators
    • ▷ Explain
    • ▷ Kl For BNs
    • ▷ Comparing Bn
    • ▷ Colouring And Exporting BNs
    • ▷ Config For PyAgrum
pyAgrum

Quasi-continuous BN¶

Creative Commons License aGrUM interactive online version
In [1]:
from pylab import *
import matplotlib.pyplot as plt

aGrUM cannot (currently) deal with with continuous variables. However, a discrete variable with a large enough domain size is an approximation of such variables.

In [2]:
import pyAgrum as gum
import pyAgrum.lib.notebook as gnb

#nbr of states for quasi continuous variables. You can change the value 
#but be careful of the quadratic behavior of both memory and time complexity
#in this example.
minB,maxB=-3,3
minC,maxC=4,14
NB=300 
In [3]:
bn=gum.BayesNet("Quasi-Continuous")
bn.add(gum.LabelizedVariable("A","A binary variable",2))
bn.add(gum.NumericalDiscreteVariable("B","A range variable",minB,maxB,NB))
bn.addArc("A","B")
print(bn)
gnb.showBN(bn)
BN{nodes: 2, arcs: 1, domainSize: 600, dim: 599, mem: 4Ko 720o}
pyAgrum▶Examples▷Quasi ContinuousG pyAgrum▶Examples▷Quasi ContinuousA A pyAgrum▶Examples▷Quasi ContinuousB B pyAgrum▶Examples▷Quasi ContinuousA->B
In [4]:
bn.cpt("A")[:]=[0.4, 0.6]
gnb.showProba(bn.cpt("A"))
<rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <cc:Work> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:date>2023-05-24T14:49:14.502048</dc:date> <dc:format>image/svg+xml</dc:format> <dc:creator> <cc:Agent> <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title> </cc:Agent> </dc:creator> </cc:Work> </rdf:RDF> <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style>

CPT for quasi-continuous variables (with parents)¶

Using python (and scipy), it is easy to find pdf for continuous variable

In [5]:
# we truncate a pdf, so we need to normalize 
def normalize(rv,vmin,vmax,size):
    pdf=rv.pdf(linspace(vmin,vmax,size))
    return (pdf/sum(pdf))

from scipy.stats import norm,genhyperbolic
p, a, b = 0.5, 1.5, -0.7
bn.cpt("B")[{'A':0}]=normalize(norm(2.41),minB,maxB,NB)
bn.cpt("B")[{'A':1}]=normalize(genhyperbolic(p,a,b),minB,maxB,NB)
gnb.flow.clear()
gnb.flow.add(gnb.getProba(bn.cpt("B").extract({"A":0})),caption="P(B|A=0)")
gnb.flow.add(gnb.getProba(bn.cpt("B").extract({"A":1})),caption="P(B|A=1)")
gnb.flow.display()

P(B|A=0)

P(B|A=1)

Quasi-continuous inference (with no evidence)¶

In [6]:
gnb.showPosterior(bn,target="B",evs={})
<rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <cc:Work> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:date>2023-05-24T14:49:15.505409</dc:date> <dc:format>image/svg+xml</dc:format> <dc:creator> <cc:Agent> <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title> </cc:Agent> </dc:creator> </cc:Work> </rdf:RDF> <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style>

Quasi-continuous variable with quasi-continuous parent¶

In [7]:
bn.add(gum.NumericalDiscreteVariable("C","Another quasi continuous variable",minC,maxC,NB))
bn.addArc("B","C")
gnb.showBN(bn) # B and C are quasi-continouous
pyAgrum▶Examples▷Quasi ContinuousG pyAgrum▶Examples▷Quasi ContinuousA A pyAgrum▶Examples▷Quasi ContinuousB B pyAgrum▶Examples▷Quasi ContinuousA->B pyAgrum▶Examples▷Quasi ContinuousC C pyAgrum▶Examples▷Quasi ContinuousB->C

Even if this BN is quite small (and linear), the size of nodes $B$ et $C$ are rather big and creates a complex model (NBxNB parameters in $P(C|B)$).

In [8]:
print("nombre de paramètres du bn : {0}".format(bn.dim()))
print("domaine du bn : 10^{0}".format(bn.log10DomainSize()))
nombre de paramètres du bn : 90299
domaine du bn : 10^5.2552725051033065
In [9]:
from scipy.stats import gamma
# cpt("C") is NB x NB matrix !
l=[]
for i in range(NB):
    k=(i*10.0)/NB
    l.append(normalize(gamma(k+1),4,14,NB))

bn.cpt("C")[:]=l

def showB(n:int):
    gnb.flow.add(gnb.getProba(bn.cpt("C").extract({"B":n})),
                 caption=f"P(C|B={bn.variable('B').label(n)})")
    
gnb.flow.clear()
showB(0)
showB(NB//4)
showB(NB*2//3)
showB(NB-1)
gnb.flow.display()

P(C|B=-3)

P(C|B=-1.495)

P(C|B=1.0134)

P(C|B=3)

Inference in quasi-continuous BN¶

In [10]:
import time

ts = time.time()
ie=gum.LazyPropagation(bn)
ie.makeInference()
q=ie.posterior("C")
te=time.time()
gnb.flow.add(gnb.getPosterior(bn,target="C",evs={}),caption=f"P(C) computed in {te-ts:2.5f} sec for a model with {bn.dim()} paramters")
gnb.flow.display()

P(C) computed in 0.00128 sec for a model with 90299 paramters

Changing prior¶

In [11]:
bn.cpt("A")[:]=[0.9,0.1]
             
gnb.flow.add(gnb.getPosterior(bn,target="C",evs={}),caption="P(C) with P(A)=[0.9,0.1]")
gnb.flow.display()

P(C) with P(A)=[0.9,0.1]

inference with evidence in quasi-continuous BN¶

We want to compute

  • $P(A | C=9)$
  • $P(B | C=9)$
In [12]:
ie=gum.LazyPropagation(bn)
ie.setEvidence({'C':bn.variable("C").closestLabel(9)})
ie.makeInference()
plot(linspace(minB,maxB,NB),ie.posterior("B")[:])
title("P( B | C={0})".format(bn.variable("C").closestLabel(9)));
<rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <cc:Work> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:date>2023-05-24T14:49:17.137364</dc:date> <dc:format>image/svg+xml</dc:format> <dc:creator> <cc:Agent> <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title> </cc:Agent> </dc:creator> </cc:Work> </rdf:RDF> <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style>
In [13]:
gnb.showPosterior(bn,target="B",evs={"C":bn.variable("C").closestLabel(9)})
<rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <cc:Work> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:date>2023-05-24T14:49:17.259558</dc:date> <dc:format>image/svg+xml</dc:format> <dc:creator> <cc:Agent> <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title> </cc:Agent> </dc:creator> </cc:Work> </rdf:RDF> <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style>
In [14]:
gnb.showProba(ie.posterior("A"))
<rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <cc:Work> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:date>2023-05-24T14:49:17.375679</dc:date> <dc:format>image/svg+xml</dc:format> <dc:creator> <cc:Agent> <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title> </cc:Agent> </dc:creator> </cc:Work> </rdf:RDF> <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style>

Multiple inference : MAP DECISION between Gaussian and generalized hyperbolic distributions¶

What is the behaviour of $P(A | C=i)$   when $i$ varies ? I.e. we perform a MAP decision between the two models ($A=0$  for the Gaussian distribution and $A=1$  for the generalized hyperbolic distribution).

In [15]:
bn.cpt("A")[:]=[0.1, 0.9]
ie=gum.LazyPropagation(bn)
p0=[]
p1=[]
for i in bn.variable("C").labels():
    ie.setEvidence({'C':i})
    ie.makeInference()    
    p0.append(ie.posterior("A")[0])    
    p1.append(ie.posterior("A")[1])

x=[float(v) for v in bn.variable("C").labels()]
plot(x,p0)
plot(x,p1)
title("P( A | C=i) with prior p(A)=[0.1,0.9]")
legend(["A=0","A=1"],loc='best')
inters=(transpose(p0)<transpose(p1)).argmin()

text(x[inters]-0.2,p0[inters],
     "{0},{1:5.4f}  ".format(x[inters],p0[inters]),
     bbox=dict(facecolor='red', alpha=0.1),ha='right');
<rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <cc:Work> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/> <dc:date>2023-05-24T14:49:17.626714</dc:date> <dc:format>image/svg+xml</dc:format> <dc:creator> <cc:Agent> <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title> </cc:Agent> </dc:creator> </cc:Work> </rdf:RDF> <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style>