import time
from pyAgrum.lib.bn2graph import BN2dot
import numpy as np
import pandas as pd
import pyAgrum as gum
import pyAgrum.lib.notebook as gnb
import pyAgrum.lib.explain as expl
import matplotlib.pyplot as plt
We build a simple graph for the example
template=gum.fastBN("X1->X2->Y;X3->Z->Y;X0->Z;X1->Z;X2->R[5];Z->R;X1->Y")
data_path = "res/shap/Data_6var_direct_indirect.csv"
#gum.generateSample(template,1000,data_path)
learner = gum.BNLearner(data_path,template)
bn = learner.learnParameters(template.dag())
bn
Given a model, it may be interesting to investigate the conditional independences of the class Y created by this very model.
# this function explores all the CI between 2 variables and computes the p-values w.r.t to a csv file.
expl.independenceListForPairs(bn,data_path)
{('R', 'X0', ('X1', 'Z')): 0.7083382647903902, ('R', 'X1', ('X2', 'Z')): 0.46938486254099493, ('R', 'X3', ('X1', 'Z')): 0.4128522974536623, ('R', 'Y', ('X2', 'Z')): 0.8684231094674686, ('X0', 'X1', ()): 0.723302358657366, ('X0', 'X2', ()): 0.9801394906304377, ('X0', 'X3', ()): 0.7676868597218647, ('X0', 'Y', ('X1', 'Z')): 0.5816487109659612, ('X1', 'X3', ()): 0.5216508257424717, ('X2', 'X3', ()): 0.9837021981131505, ('X2', 'Z', ('X1',)): 0.6638491605436834, ('X3', 'Y', ('X1', 'Z')): 0.8774081450472304}
... with respect to a specific target.
expl.independenceListForPairs(bn,data_path,target="Y")
{('Y', 'R', ('X2', 'Z')): 0.8684231094674686, ('Y', 'X0', ('X1', 'Z')): 0.5816487109659612, ('Y', 'X3', ('X1', 'Z')): 0.8774081450472304}
print(expl.ShapValues.__doc__)
The ShapValue class implements the calculation of Shap values in Bayesian networks. The main implementation is based on Conditional Shap values [3]_, but the Interventional calculation method proposed in [2]_ is also present. In addition, a new causal method, based on [1]_, is implemented which is well suited for Bayesian networks. .. [1] Heskes, T., Sijben, E., Bucur, I., & Claassen, T. (2020). Causal Shapley Values: Exploiting Causal Knowledge. 34th Conference on Neural Information Processing Systems. Vancouver, Canada. .. [2] Janzing, D., Minorics, L., & Blöbaum, P. (2019). Feature relevance quantification in explainable AI: A causality problem. arXiv: Machine Learning. Retrieved 6 24, 2021, from https://arxiv.org/abs/1910.13413 .. [3] Lundberg, S. M., & Su-In, L. (2017). A Unified Approach to Interpreting Model. 31st Conference on Neural Information Processing Systems. Long Beach, CA, USA.
The ShapValue class implements the calculation of Shap values in Bayesian networks. It is necessary to specify a target and to provide a Bayesian network whose parameters are known and will be used later in the different calculation methods.
gumshap = expl.ShapValues(bn, 'Y')
A dataset (as a pandas.dataframe
) must be provided so that the Bayesian network can learn its parameters and then predict.
The method conditional
computes the conditonal shap values using the Bayesian Networks. It returns 2 graphs and a dictionary. The first one shows the distribution of the shap values for each of the variables, the second one classifies the variables by their importance.
train = pd.read_csv(data_path).sample(frac=1.)
t_start = time.time()
resultat = gumshap.conditional(train, plot=True,plot_importance=True,percentage=False)
print(f'Run Time : {time.time()-t_start} sec')
Run Time : 6.814778804779053 sec
t_start = time.time()
resultat = gumshap.conditional(train, plot=False,plot_importance=True,percentage=False)
print(f'Run Time : {time.time()-t_start} sec')
Run Time : 6.583592414855957 sec
result = gumshap.conditional(train, plot=True,plot_importance=False,percentage=False)
#result is a Dict[str,float] of the different Shapley values for all nodes.