complexity Thirty years of credal networks: Speciﬁcation, algorithms and International Journal of Approximate Reasoning

(1)

Contents lists available atScienceDirect

International Journal of Approximate Reasoning

www.elsevier.com/locate/ijar

Thirty years of credal networks: Speciﬁcation, algorithms and complexity

Denis Deratani Mauá

^a^,∗

, Fabio Gagliardi Cozman

^b

aInstituteofMathematicsandStatistics,UniversidadedeSãoPaulo,Brazil bEscolaPolitécnica,UniversidadedeSãoPaulo,Brazil

a r t i c l e i n f o a b s t r a c t

Articlehistory:

Received20December2018

Receivedinrevisedform10August2020 Accepted12August2020

Availableonline21August2020

Keywords:

Impreciseprobabilities Probabilisticgraphicalmodels

Credal networks generalize Bayesian networks to allow for imprecision in probability values.Thispaperreviewsthemainresultsoncredalnetworksunderstrongindependence, as therehasbeensigniﬁcantprogressintheliteratureduringthelastdecadeorso.We focusoncomputationalaspects,summarizingthemainalgorithmsandcomplexityresults forinferenceanddecisionmaking.Weaddressthequestion“Whatisreallyknownabout strongextensionsofcredalnetworks?”bylookingattheoreticalresultsandbypresenting ashortsummaryofrealapplications.

©²⁰²⁰ÊlsevierÎnc.Âll^rights^reserved.

1. Introduction

The now unclassiﬁed United StatesNationalIntelligence Estimate report number 29-51concluded that “an attack on Yugoslaviain1951shouldbeconsideredaseriouspossibility”.Whenaskedtoquantifytheexpression“seriouspossibility”, theauthorsofthereportprovidedoddsthatrangedfrom20/80to80/20infavorofanattack[1].Translatingtoprobabilities, theexpertsassessedthat

0

.

2

≤ P(

attack

) ≤

0

.

8

.

This is not an isolated example: imprecision in probability values pervades practical life. Often thisimprecision appears whenmodelingscenarioswithmanyrandomvariablesthatinteractinanon-trivialway.

Accordingtothereport,thepossibilityofanattackwas largelydependentonadecisiontoinvadethecountry.Thelatter eventwasinﬂuencedby broadSovietobjectives,butalsobyapossibleregimechangeinYugoslavia,perhapsduetoTito’s assassination,orduetoa coup/revolt. Adecisionto invadeshould causeamilitarybuild-up andan increase inpropaganda against supposed Sovietenemies.This intricate setof relationsis capturedby the directed acyclicgraph(DAG) inFig. 1, whereeachnodeisa(binary)randomvariable.¹ Ifwecouldassociateasharpconditionalprobabilityvaluewitheachvalue ofa variable giveneach value of those variablesthat pointto it inthe graph,we would obtain aBayesian network [2].

Thatis, we would havea “probabilisticgraphical model” that could be used to calculatethe probability of anypossible scenarioinvolvingthevariablesinthemodel(e.g.,theprobabilityofan invasiongiventheobservationofmilitarybuild-up

*

Correspondingauthor.

E-mailaddress:denis.maua@usp.br(D.D. Mauá).

1 Notethateachoneoftheseevents(attack,propaganda,etc)canbeassociatedwitharandomvariablethatyields1 whentheeventhappensand0 otherwise;tosimplifymatters,weoftendonotdistinguishbetweenaneventandthecorrespondingrandomvariablethatindicatesitshappening.

https://doi.org/10.1016/j.ijar.2020.08.009

0888-613X/©²⁰²⁰ÊlsevierÎnc.Âll^rights^reserved.

(2)

assassination coup/revolt

regime change decision to invade

build-up propaganda

attack

invasion

Fig. 1.A simpliﬁed graphical representation for the reasoning in the 29-51 report.

and propaganda) [3,4]. Ifwe insteadcannot associatea single jointprobability distributionwith the graph,we obtain a credalnetwork.

In short,credalnetworksare directedacyclicgraphswhose nodesrepresentrandom variablesassociatedwith“impre- cise/indeterminate” probabilistic assessments, and whose arcs (or their absence) represent irrelevance or independence.

Credal networks keep the graphicalappealof Bayesian networks[2] while allowing for a moreﬂexible quantiﬁcation of values. Thisability torepresent andreasonwithimprecision inprobability values allows credalnetworksto reach more conservativeandrobustconclusionsthanBayesiannetworks.

Eventhoughitisdifficulttopinpointwhencredalnetworksandtheirstrongextensionsfirstappearedintheliterature, itseemsfairtosaythat by1990some authorshadalreadytouchedontheir basicfeatures.Theliterature evolvedsteadily until 2005, whenthe second authorpublished areview of graph-basedtechniquesthat dealwithimprecise probabilities [5].Sincethen,researchinthefieldhasledtoaconsiderablechangeinourunderstandingofcredalnetworks.First,arich toolboxofalgorithmshasbeenputtogether,containing(fast)approximatetechniquesand(slow)exactmethods.Second,the complexity ofvariousinferenceshasbeenstudiedindepth,sowecanidentifyfeaturesthatguaranteetractableinference.

Third,thereisnowasigniﬁcantnumberofrealapplicationsofcredalnetworksinvariousdomainswhereitisadvantageous toencode imprecisioninprobability values.Finally,thelast tenyearshavewitnessedprogressalsoonelicitationofcredal networksandondecisionmakingwithcredalnetworks.

Severalofthosemorerecentresultshavebeendiscussedinanexcellenttutorialoncredalnetworkspublishedin2014 [6].Alsoofnoteisapreviousrelevanttutorialfocusedontheelicitation ofcredalnetworks[7].Thepresentreviewshould be valuable tothenovice whohasread some ofthatintroductory material, andwants morescholarly detail;thisreview should alsobe usefulto theresearcherwho is interestedinan up-to-datetechnical introductiontocredal networks. We willexaminethemainresultsonthetheoryofcredalnetworks,includingbasic(butnon-trivial)resultsabouttheircharac- terization,algorithmicstrategiesforproducinginferencesandcomputationalcomplexityresults.Inshort,wetrytoprovide some guidanceregardingthefairquestion:“Aftersomethirtyyearsofresearchoncredalnetworks,whatexactlyisknown abouttheirspeciﬁcation,algorithmsandinferentialcomplexity?”.

Therearemanywaystointerpretindependencerelationsinacredalnetwork;inthispaperwefocusonwhatisknown asstrongindependenceanditsrelatedconceptofstrongextension.Aswediscusslater, thereareotherwell-knownexten- sionsforcredalnetworksintheliterature[8],butstrongextensionseemstobethemostpopularone.

The paper isorganized as follows.Section 2 summarizesthe literature up until2005, presentingan abridgedversion ofthesurveywe havementioned [5].Section3introducesbasicterminologyandnotation, whileSection 4presentssome basicresultsoncredalnetworks.InSection5,weprovideafairsampleofthealgorithmsformarginalinferenceanddecision making,whileinSection6wepresentresultsoncomputationalcomplexity ofthosetasks.Section7containsanoverview of some recentapplications of credal networks. We ﬁnish in Section 8 with commentson recent developments andan optimisticviewaboutthefutureofcredalnetworks.

2. Credalnetworksupto2004:asummary

Severalearlyexpertsystemsandknowledgerepresentationformalismsweredevelopedamidstintensedebateconcerning the bestwaytohandleuncertainty[9–11].Bayesiannetworksappearedby themiddleeighties[2] andhavesinceproved thatprobabilisticmodelingcanbeusedsuccessfullyinmanycircumstances.AssoonasBayesiannetworksstartedtoattract interest,researchersfelttheneedtogeneralizethemsoastoallowtherepresentationofimprecisioninprobabilityvalues.

Onemotivationforthiswas thediﬃculty inelicitingthemanyprobabilitiesrequiredbyaBayesiannetwork.Anothermo- tivation was theanalysisofrobustness to perturbationsin Bayesiannetworks(aparallel lineofresearch hasinvestigated sensitivitytochangesinparameters[12,13]).Probablyanevenstrongermotivation,amongstresearchersinterestedinarti- ﬁcialintelligence,wasthedesiretostudymanylanguagesinwhichtorepresentuncertainty,fromDempsterShafertheory topossibilitytheory,manyofwhichcouldbereframedusingintervalprobabilities,intervalexpectations,orcredalsets.

(3)

TheearliestappearanceofBayesiannetworkswithset-valuedparametersseemstobebyLamataandMoral[14].There was then considerabledebate on the meaningof independenceforspecial kindsof assessments,such asinterval probabilities,Choquetcapacities andqualitative probabilities.Theearly effortsthatadoptedset-valuedassessmentstook strong extensionsasthesolesemanticsofanygivencredalnetwork.

Bymidnineties,manyconsequencesoftheMarkovcondition andinference algorithms hadbeendeveloped forstrong extensions.Hereinferencereferstoa computationoftightprobability boundsforavalue ofsome variablegivenvaluesof someothervariables.PerhapsthemostrepresentativereferenceofthatperiodistheworkofCanoetal.[15].Atthatpoint there was great emphasison inference algorithms that operatedby passing messages around nodes,thus mimickingthe behaviorofinferencealgorithmsforBayesiannetworks.Algorithmssuchasthe2-Updating (2U),forpolytreeswithbinary variables[16],orTessem’spropagation,whereintervalarithmeticsisexploited [17],representedthestate-of-art.References thatreﬂectthosedevelopments,andthatincludeothergraph-theoreticalstructures,canbefoundinprevioussurveypapers [5,18].

It became clear by mid nineties that, instead of focusing on special cases such as interval probabilities, one should adoptgeneralcredal setsfrom theoutset. Giventhat the classof closedconvexcredal setsis closedunder elementwise marginalization andconditioning undermild conditions [19], andthat mostaxiomatizations dealing with imprecisionin probability valuesgenerateconvexcredalsets[20], mostoftheliterature just assumedclosureandconvexitythroughout.

Bymidninetiesit alsobecameclearthat, insteadoffocusing onasingle concept ofindependence,one mustaccept that severalconcepts ofindependence makesense when credalsets are present; this understanding was championed by the inﬂuentialworkofWalley[21].Theideathatcredalnetworksshouldbebroadenoughtoencompassseveraldeﬁnitionsof independencewasthenproposedbytheendofthenineties[22].

Computationalexperienceduringtheninetiesledtotheunderstandingthat,insteadoftryingtoadaptBayesiannetwork messagepassing algorithms to inference withstrongextensions,the mostproﬁtablepath was tolook atinference asan optimizationproblem.Eventhoughanumberofsuccessfulmessagepassingtechniqueshaveemerged,themainalgorithms developedfromthelateninetiesonhavebeeninspiredbyoptimizationtechniques,asdescribedinSection5.

3. Background:Bayesiannetworksandcredalsets

Giventhat credal networksare extensions ofBayesian network that replaceconditional probability distributions with credalsets,inthissectionwereviewbasicconceptsforBayesiannetworksandcredalsets;thissectionalsohelpsinﬁxing notationandterminologyusedlater.

3.1. Bayesiannetworks

InaDAG,anode X isaparentofanode Y (conversely,Y isachildof X)ifthereisanarcfrom X toY.Wedenotethe parentsofanode X byPa(X)anditschildrenbyCh(X).Thein-degreeofanodeisthecardinalityofPa(X).Theancestorsof anodeareitsparents,theparentsofitsparents,andsoon,recursively.Similarly,thedescendantsofanodeareitschildren, the childrenof itschildren, andso on,recursively; thenondescendantsare all nodesthat are notdescendants exceptthe nodeitself.

Formally,aBayesiannetworkisapaircontainingaDAGwhosenodesarerandomvariablesinasetX= {^X1,. . . ,Xn}^,^and aprobability distributionP(X)overX.TheDAGandthedistributionsatisfy theMarkovcondition: forevery node/variable X,the nondescendantnonparents of X areindependent of X giventheparents of X. Toillustratethe Markovcondition, consideragainFig.1.Thestructureofthegraphimpliesanumberofindependencerelations.Forexample,thegraphimplies thatTito’sassassinationandcoup/revoltareindependent.Ifsuchindependencerelationisdeemedunreasonable,thenedges should be added accordingly.Another independencerelation extractedfrom thegraph is: given regimechange,then the decisiontoinvadeisindependentofTito’s assassination.Thiscapturestheessence oftheMarkovcondition:onceweknow theparentsofavariable,anythingaboutotherancestorsisirrelevanttothevariable.

Inthispaper,weconsideronlyrandomvariableswithﬁnitesupport.Whenthisisthecase,theMarkovconditionimplies aparticularfactorizationofthejointprobabilitydistribution:

P(

X₁

=

^x1

, . . . ,

X_n

=

^xn

) =

n i=¹

P(

X_i

=

^xi

|

^Pa

(

X_i

) = π

i

),

(1) wheneverallconditionalprobabilitiesontherightarewell-deﬁned.Intheequationabove

π

i istheprojectionof{^x1,. . . ,xn} overPa(X_i),andifPa(X)isempty,thenweuseP(X_i=^xi)inlieuofP(X_i=^xi|^Pa(X_i)=

π

i).

Because“local” conditionalprobabilities P(X=^x|^Pa(X)=

π

)suﬃceto specifyanyBayesiannetwork, oftenaBayesian networkisspeciﬁedthroughaDAGandasetofprobabilityassessmentsP(X=^x|^Pa(X)=

π

)foreach X,xand

π

.Thisleads toadrasticreductioninthenumberofparametersonemustspecifyandrepresentwithrespecttothejointdistribution.

The moralgraph of a DAG is obtained by connecting nodeswith a common child and then dropping arc directions.

One celebratedfeatureofBayesian networksis thatone can detectmany independencerelationsimplied by theMarkov condition usingd-separation [2]. The latter isa graph-theoretical criterion that we apply asfollows to a given Bayesian network. Suppose we havethree sets ofvariables X, Y, Z. Delete all nodesthat are not ancestors to atleast one ofthe

(4)

1

assassination coup/revolt regimechange

2 regimechange

decisiontoinvade 4

attack decisiontoinvade

build-up 3 decisiontoinvade

propaganda

5

attack build-up invasion 6 attack

Fig. 2.Rooted tree decomposition for the DAG in Fig.1.

variablesinthosesets,andobtainthemoralgraphoftheresultingDAG.SetsXandYared-separatedbyZifandonlyifin theresultingundirectedgraphanypathbetweenXandYisblockedbyZ(i.e.,itcontainsanodeinZ) [23].Theimportant consequenceofthisd-separationisthatXandYarestochasticallyindependentgivenZ.Becaused-separationcanbequickly computed [24],itisoftenemployedinpracticetodiscardunnecessaryvariablesbeforeanyinferenceiscarriedout.

The complexity ofdrawing inferenceswithBayesiannetworks istightlycoupled withthetopology ofits DAG,andto graph-theoreticalconceptssuchastreewidth.

A polytreeis aDAG that contains no undirectedcycles,that is,the subjacentundirected graphwe obtainby ignoring thearcdirectionshasnocycles.Adirectedtreeisapolytreewitheachnodehavingatmostoneparent.Polytreesarealso calledsinglyconnecteddirectedgraphs.ADAGisloopyormultiplyconnectedifitisnotapolytree.TheDAGinFig.1isloopy;

thesubDAGoverthenodesassassination,coup/revoltandregimechangeisapolytreethatisnotatree.

Acycleinanundirectedgraphcontainsachordifthereisanedgewhichconnectstwonodesofthecycleandisnotin thatcycle.Anundirectedgraphischordalifandonlyifeverycycleoflengthfourormorehasachord.Everygraphcanbe turnedintoachordalgraphbyinsertingedges,aprocesscalledchordalization.

The treewidthof achordal graphis thesize ofthelargest cliqueminus one, thatis, themaximumsize ofa complete subgraph minus one (a complete graphis a graph such that each pairof nodes isconnected). The treewidth ofa non- chordal graphis theminimumtreewidth over allchordalizations ofit. Asits namesuggests, thetreewidth measures the resemblanceofanundirectedgraphtoatree.Thetreewidthofadirectedgraphisthetreewidthofitsmoralgraph.Inthe caseofpolytrees,thetreewidthisgivenbythemaximumin-degreeofanode(sothetreewidthofdirectedtreesisone).All knownalgorithms fordrawing inferenceinBayesiannetworkshavea worst-caserunningtime thatisatleastexponential inthenetworktreewidth.

A tree-decompositionof an undirectedgraph G=(V,E) isa pair (C,T),where C is a collectionof subsetsCi⊆^V ôf nodes ofG, andT is an undirectedtreewhose nodesare identifiedwith elements ofC, andthat satisfiesthe following properties: (1) Forevery edge (u,v) in E there isa set Ci∈^C ^with û,v∈^Ci; (2)For every node v in V the subgraph obtainedbydeletingnodes Ci not containing v isa(connected)tree.The widthofatreedecompositionisthemaximum cardinalityofasetC_i minusone.Theminimumwidthoveralltreedecompositionsofagraphisthegraph’streewidth.A tree-decompositioncanberootedatacertainnodeR∈^C ^by^directingêdgesâway^from^R;^thisâllowsûs^to^refer^to^parents andchildren of thenodesin T (notethat a rooted T isa directedtree,thus every node hasatmostone parent).Fig. 2 showsatree-decompositionfortheDAGinFig.1rootedatR= {âttack}^.

3.2. Credalsets

It is often diﬃcult to specify sharp probability values for events dueto insuﬃciency of resources (time, cost, data) or lack of consensus. An alternative is to specify a set ofprobability distributions that encodes the incompleteness and indeterminacyinthebeliefsofthemodelbuilder,andarguablyleadstomoreconservativeandhencerobustconclusions.

We start with some needed concepts of imprecisely specified probability models and their usual notation. A set of probabilitydistributionsforavariableX iscalledacredalsetandisdenotedbyK(X)[19].Similarly,K(X1,. . . ,Xn)denotes a set of jointdistributions forvariables X₁,. . . ,X_n.The convexhull ofa set Kis denoted by coK. Afinitely generated setistheconvexhullofafinitesetofpoints (henceitisclosedandconvex).Asetofconditionalprobabilitydistributions foravariable X givenevent A iscalledaconditionalcredalsetanddenotedby K(X|Â).Thelower probabilityP_K(_X₎(A)is equaltoinfP(A),wheretheinfimumistakenoverallpossibleprobabilityvaluesforevent A inducedbythedistributions inK(X);similarlywe havethelowerconditionalprobabilityP_K(_X_|_B₎(A|^B).Wealsohavetheupperprobability PK(X)(A) that issupP(A)andsimilarlyupperconditional probabilities.Intheoperators above,weusuallyomitthedependenceon the credal set when its choice is clearfrom context and write, for example,P(A) andP(A|^B). Bymarginalizing every distribution in the credal set K(X₁,. . . ,X_n), we obtain a marginal credal set over a subset of the variables. Whenever P(Y=^y)>0,weobtainaconditionalcredalsetK(X|^Y=^y)byapplyingBayesruletoeachdistributioninK(X,Y)such that P(Y=^y)>0.Thisprescriptionforconditioningisoftencalledregularconditioning[25].Onecanfindprescriptionsin the literature that handle conditioningevents differentlywhen they mayhave zeroprobability [26–28]. One particularly

(5)

popularalternativestrategy istotakeK(X|^Y=^y)to bevacuous² wheneverP(Y=^y)=^0,ândôtherwise^to âpply^Bayes ruleelementwiseasbefore;becausethisstrategyisobtainedwithWalley’snaturalextension [21],werefertoitasnatural conditioning.³ Inour definitions andresults we adopt regular conditioning unless explicitlyindicated, andwe implicitly assumethattheupperprobabilityoftheconditioningeventispositive.

Onebasicresultisthat,ifacredalsetisconvex,bothitselementwisemarginalizationanditselementwiseconditioning leadtoconvexsets[19].

One can specifya credal set by listing constraintson probability values.Ifa credal set is closedand convex, then it canbe describedbylisting itsextremepoints(andtakingtheconvexhull).Constraintsandextremepointsareconnected by duality:one can transformone representation intothe other usingwell known algorithms [29]. However, the size of therepresentationmayexplodethroughthistransformation:asetof(evenlinear)constraintsmaygenerateanexponential numberofextremepointsandvice-versa.Somealgorithms discussedlater assumethat theinputcredalsetsare givenas setsofconstraints,whileotheralgorithmsassumetheinputisalistofextremepoints;thesearemathematicallyequivalent, butnotcomputationallyinterchangeable.Mattersaredramaticallysimplerifwedealwithasinglebinaryvariable X:inthis case,aclosedconvexcredalsetK(X)isfullycapturedbyasingleclosedinterval[P(X=1),P(X=1)].

Thereareseveralconceptsofindependencethatapply tosetsofprobabilitydistributions[28,30].Aconceptuallysimple conceptiscompleteindependence:variablesX andY arecompletelyindependentwithrespecttoacredalsetK(X,Y)ifthey are stochastically independent withrespect to every jointdistribution in K(X,Y).This deﬁnitionextends to conditional independenceinthetrivialway:inessence,itrequireselementwisestochasticindependence.

If X andY arecompletelyindependentwithrespecttosomeK(X,Y),theningeneralK(X,Y)cannotbeaconvexset [28]. Complete independenceseems to requireone toaccommodate non-convexcredalsets. As manyprincipled waysto justifycredalsetspostulatethattheyoughttobeconvex [19,31],therehasbeensteadyinterestin“convexifying”complete independence,amovethatyieldsstrongindependence.

Variables X and Y are stronglyindependent withrespect to a credal set K(X,Y) if the latter can be written asthe convexhullof acredal setforwhich X and Y are completely independent.When K(X,Y)is closedand convex,strong independenceisequivalenttoassumingthat X andY arestochasticallyindependentwithrespecttoeachextremepointof K(X,Y).

Anotherpopularconceptisepistemicirrelevance[21]:Y isepistemicallyirrelevantto X givenZ whentheconvexhullof K(X|^Y=^y,Z=^z)isequaltotheconvexhullofK(X|^Z=^z)foreverypossible(y,z).Epistemicirrelevanceisasymmetric:

X may be epistemicallyirrelevant to Y while theopposite fails[21].Epistemicindependence isa symmetrized version of epistemicirrelevance: XandY areepistemicallyindependentwhenX isepistemicallyirrelevanttoYandY isepistemically irrelevantto X.

YetanotherconceptofindependenceisduetoKuznetsov,ageneralizationoftheusualfactorizationofexpectationsthat isimpliedby stochasticindependence[32].LittleisknownaboutKuznetsovindependence,andthefew existinginference algorithmsthatcanhandleitrelyonoptimizationtolookforboundsonprobabilities[33].

4. Credalnetworksandtheirstrongextensions

AcredalnetworkconsistsofapaircontainingaDAGwhereeachnodeisavariable,andacredalsetforthosevariables, such that,foreverynode/variable X,thenondescendant nonparentsof X are “independent”of X giventheparents of X. ObviouslytheMarkovconditionhasadifferenteffectdependingontheconceptofindependencethatisadopted.

Suppose for instancethat “independence”in the Markov condition means “epistemic irrelevance”. As epistemic irrelevance isnot symmetric,the directionof edges in the credalnetwork then gets denser content. During thelast decade arichtheory andapowerfulsetofalgorithms havebeendevelopedaroundextensionsofcredalnetworksthat satisfythe Markovconditionwithepistemicirrelevance—wedonotdwellonthattopicasarecentin-depthreviewpaperhascovered allnecessaryterritory[8].AsforepistemicindependenceandKuznetsovindependence,littleisknowninconnectionwith credalnetworks(eventhoughcredalnetworksunderepistemicindependencehavereceivedsomeattention[34]).

IntheremainderofthispaperweadopttheMarkovconditionwithstrongindependence;thatis,eachvariableisstrongly independentofitsnondescendantsgivenitsparents.

4.1. Specifyingcredalnetworks

Themostcommonwaytospecifysetsofprobabilitiesassociatedwithcredalnetworksistomimicthestrategyusedfor Bayesiannetworks.Thatis,tospecifyaclosedconvexcredalsetK(X|^Pa(X)=

π

)foreach variable X,foreachvalue

π

of parentsPa(X).Asaverysimpleexample,wemighthavethreebinaryvariables X,Y andZ,thesimpleDAG

X Y Z

2 Acredalsetisvacuousifitcontainsalldistributionsofthatvariable.

3 Yetanotherstrategyistoreplaceusualprobabilitymeasuresbyfullconditionalprobabilitiesandtheirvariants[27],whereconditionalprobabilityis deﬁnedevenwhentheconditioningeventhasprobabilityzero—thisisamovewedonotconsiderinthispaper.

(6)

andprobabilisticconstraints

P (

X

=

¹

) ∈ [

¹

/

10

,

1

/

3

] , P (

Y

=

¹

) =

⁴

/

5

,

(2)

P (

Z

=

¹

|

^Y

=

⁰

) ∈ [

²

/

5

,

3

/

5

] , P(

Z

=

¹

|

^Y

=

¹

) ∈ [

⁷

/

10

,

9

/

10

] .

(3) Recall thatan intervalsuﬃcesto specifythecredalsetforabinary variable.Ingeneralamorecomplexlanguage maybe neededtospecifycredalsetsforvariableswithmorethantwovalues(Section4.4).

WhenacredalnetworkisspeciﬁedthroughaDAGandasetofcredalsetsK(X|^Pa(X)=

π

),oneforeachvariable Xand eachvalue

π

ofparentsof X,wesaythenetworkisseparatelyspeciﬁed.

Adifferentspeciﬁcationstrategyistoexplicitlydescribea setoffunctionsP(X|^Pa(X))foreachvariable X (note:here P(X|^Pa(X)) is a functionboth of X andofPa(X)). We then saythat the credal setfor X is extensivelyspeciﬁed, orjust extensive forshort.Forinstance,onemightreplaceassessmentsinExpression(3) bytwofunctionsP₁(Z|^Y)andP₂(Z|^Y), takingtheirconvexhullasthesetofpossiblefunctionsP(Z|^Y).

Afewkindsofnon-separatelyspeciﬁedcredalnetworkshavebeenexploredintheliterature.Oneexampleisthe(large) familyofqualitativeprobabilisticnetworks(QPNs)[35].AQPNisusuallyinterpretedasapartiallyspeciﬁedBayesiannetwork, where in lieu of the conditional probability values one writesdown qualitative constraints amongst these probabilities.

Thegoalistosimplifyelicitationefforts.Forinstance,thequalitative constraintreferredtoaspositiveadditivesynergyona variable X withparents Y andZ imposes

P(

X

=

1

|

Y

=

1

,

Z

=

1

) + P (

X

=

1

|

Y

=

0

,

Z

=

0

) ≥

P(

X

=

¹

|

^Y

=

⁰

,

Z

=

¹

) + P(

X

=

¹

|

^Y

=

¹

,

Z

=

⁰

),

meaning,roughly,thatthe“concerted”inﬂuenceof X and Z on X isgreaterthanthesumoftheir “discordant”inﬂuence.

Notethatconstraintsareimposedacrossvaluesofconditioningvariablesinanon-separatelyspeciﬁedscheme.Manyother constraints havebeenproposed toenlargetheexpressivity ofQPNs [36,37] andtoapply themtopractical scenarios [38].

Another exampleofnon-separatelyspecified credalnetworkscan befound inthe Bayesianlogic ofAndersen andHooker [39],whereadirectedgraphcanbeassociatedwithprobabilitiesovereventsdefinedwithaflexiblelogicallanguage.⁴

Whatever speciﬁcation strategy one adopts, one ends up witha set of constraints over probabilities. The largest set of jointprobability distributions that satisﬁes thoseconstraints and the Markovcondition, where “independence” means

“strong independence”, isthe strongextension ofthecredal network.Ofcourse one mightconsidersets ofjoint distributions that aresmaller thanthestrong extension;thestrongextension corresponds toaminimum commitmentgiventhe assessments.

4.2. Thestrongextensionforclosedconvexlocalcredalsets

Suppose that one has a separately speciﬁed credal network where each variable X is associated with a closed and convexconditionalcredalsetK(X|^Pa(X)=

π

),foreachconﬁguration

π

oftheparents.Thisisthemostcommonsituation discussedintheliterature.

Deﬁnethecompleteextensionofthenetwork,denotedbyK_C(X₁,. . . ,X_n),tobethefollowing(possiblynon-convex)set:

P : P(

X1

=

^x1

, . . . ,

Xn

=

^xn

) =

n i=¹

P(

X_i

=

^xi

|

^Pa

(

X_i

) = π

_i

) ,

with

P(

Xi

|

Pa

(

Xi

) = π

i

) ∈ K(

Xi

|

Pa

(

Xi

) = π

i

)

,

(4)

whereeach

π

iistheprojectionof{^x1,. . . ,xn}^over^Pa(Xi).

The set K_C(X₁,. . . ,X_n) andthe input DAG satisfy the Markov condition withrespect to complete independence:for every X,thenondescendantsof X arecompletelyindependentof X givenPa(X).

Wehavethat:

Theorem1.Theconvexhullofthecompleteextensionofacredalnetworkisthestrongextensionofthecredalnetwork:itisthelargest credalsetthatsatisﬁestheMarkovconditionwithrespecttostrongindependence.Thisisalsotruewhenwereplaceeachcredalset K(X_i|^Pa(X_i)=

π

i)inExpression(4)bythesetofitsextremepoints.

Theproofsofalltheoremsareintheappendix.

4 Non-separatelyspeciﬁedconstraintsalsosurfaceinconnectionwithconservativeupdatingonprobabilisticgraphicalmodels,wheremissingdatamay leadtocredalsets[40–42];Antonuccietal.[6] presentaninterestingdiscussionofthisphenomenon.

(7)

Hence every extreme point ofthe strongextension ofa separately speciﬁed credal networkwith closedconvex local credalsets isbuiltby acombinationof “local”extremepoints, andthusit behaveslike aBayesian network.Importantly, d-separationintheDAGimpliesstrongindependence;inthissensestrongextensionsmimicthesemanticsofindependence commonlyassociatedwithBayesiannetworks.⁵Interestingly:

Theorem2.Anycombinationofextremepointsfromthelocalcredalsetsisanextremepointofthestrongextension.

Thus the strong extension can actually be built directly from the extreme points of local credal sets, that is, coK_C(X1,. . . ,Xn)isequivalentlyproducedifwereplaceeachlocalcredalsetK(Xi|^Pa(Xi)=

π

i)inExpression (4) bytheset ofitsextremepoints,denotedby extK(X_i|^Pa(X_i)=

π

i).Forinstance,take thethree-variablenetworkwithassessmentsin Expressions(2) and(3).Thestrongextensionistheconvexhullofeightdistributions,generatedbymultiplyingtheextreme pointsofK(X),K(Z|^Y=⁰)andK(Z|^Y=¹),andthedistributionP(Y=¹).

4.3. Awordonextensivelocalcredalsets

Expression(4) focusesonseparatelyspeciﬁedcredalnetworks;ifinsteadwehaveanextensivespeciﬁcation,thestrong extension istheconvexhullofalldistributions P(X)=

Xi∈^XP(Xi|^Pa(Xi))builtsothat eachfunctionP(Xi|^Pa(Xi))isan extremepointoftheextensivespeciﬁcationrelatedto Xi.

Notethatifwehaveaseparatelyspeciﬁedcredalnetworkwecanalwaysbuildasetofconditionalprobabilityfunctions P(Xi|^Pa(Xi))bycombiningallextremepointsofeachK(Xi|^Pa(Xi)=

π

)acrossall

π

.Oncewedothat,thestrongextension oftheselatterextensivelyspecifiedcredalsetswillbeexactlythestrongextensionoftheoriginalseparatelyspecifiedcredal network. Sucha transformationis howevernotharmless fromacomputational point ofview: theextensive specification relatedtoavariable X_imaybeexponentiallylargerthantheseparatelyspecifiedcredalsetsforX_i.

4.4. Speciﬁcationlanguagesforcredalnetworks

There has been relatively little work on speciﬁcation languages for credal networks. The contrast with research in Bayesian networksis striking. Since theappearance of theﬁrst inference engines forBayesian networks, there hasbeen active work inlanguages that encode probability distributions. Oneexample isthe BUGS package [47],where a declara- tive languageisusedto describethestructureandtheprobability valuesofanyBayesiannetwork.There areother visual languagessuchasDAPER[48],textual languagessuchasBLOG[49],andlanguagesthatexplorelogicprogramming[50,51]

amongotherproposals.Almostnoneofthisactivitycanbefoundinconnectionwithcredalnetworks,despitethefactthat thereisarguablymorestructuretobeexplored:possiblewaystolistextremepointsandexploitregularitiesamongstthem;

possibleways toencode qualitative andquantitative constraints,conceivably by recyclingconstructsfromconstraintpro- gramming.Existingpackagesforcredalnetworksusually adoptsimple strategiesforspeciﬁcationwhere eachlocalcredal set isrepresentedas aset oflinearinequalities, orwhereeach extremepoint is adistribution represented asatable of probabilityvalues.

Ithasrecentlybeennotedthatcredalsetscanbe speciﬁedusingprobabilisticlogicprogramming[52],butthisinsight hasnotbeenusedtospecifycredalnetworks.AnotherrelativelyrecenteffortmixesBooleanoperatorsandinterval-valued probabilitiesinspeciﬁcationssuchas [53]:

regime change

=

assassination

∨ (

coup/revolt

∧ ¬

^inhibitor

), P(

inhibitor

) ∈ [

⁰

.

9

,

1

] ,

wheretheﬁrstsentencereads“regimechangeobtainsifandonlyifthereisanassassinationoracoup/revolt(providedthe latteris notinhibited)” andthe second sentencereads“probabilityofinhibitor isno smallerthan 0.9”.Inthisparticular exampleoneobtainsaNoisy-ORgatewithinterval-valuedinhibitoryprobabilities.

Thereseemstobesignificantroomforrefinedandextendedspecificationstrategiesinthefuture.

5. Algorithmsformarginalinferenceanddecisionmaking

The mostcommoncomputation appliedto credalnetworksis marginalinference,wherewe must compute boundsfor themarginalprobabilityofavalueofaqueryvariableconditionalonsomeevidenceonsomeothervariables.Forinstance, inthecredalnetworkrepresentingapossibleattackonYugoslaviain1951,onemightbeinterestedintheprobabilityofan attackafterobservinganincreaseinpropaganda.

Credalnetworksarealsousedtosupportdecisionmaking;theretheproblemistoselectactionsthatare endorsedby someappropriatecriteria—maximality,E-admissibility,minimality.

Inthissection wereviewthemosteffectiveknownalgorithms formarginal inferenceanddecisionmakingwithcredal networks.

5 Otherconceptsofindependence,such asepistemic independence,do notnecessarilyinteractwellwith d-separation [43],and alternatives tod- separationhavebeendesignedforthem [44–46].

(8)

5.1. Marginalinference

Given a credalnetwork over variables X= {^X1,. . . ,X_n}^, ân êventôfînterest ^Z=^z ând ^some êvidence ^Y=^y, ^we âre interestedinobtaining:

P (

Z

=

z

|

Y

=

y

)

and

P(

Z

=

z

|

Y

=

y

) ,

(5) where {^Z} ând ^Y âre âssumed ^to ^be ^disjoint ^subsets ôf ^X ^such ^that P(Y=^y)>0, and we take the lower and upper probabilitieswithrespecttotheregularconditioningofthestrongextensionofthenetwork.

The following result is fundamental as it shows that marginal inference can be accomplished by operating only on distributionsthatsatisfytheMarkovproperty:

Theorem3.Thelower/upperprobabilityinExpression(5)areobtained,respectively,by inf

/

sup

x

_n

i=¹

P (

Xi

=

^xi

|

^Pa

(

Xi

) = π )

z

x

_n

i=1

P (

X_i

=

^xi

|

^Pa

(

X_i

) = π ) ,

(6)

where thesuminthenumeratorandtheinnersuminthedenominatorareoverthevaluesofthesetof marginalizedvariables X =^X\({^Z}∪^Y),theoutersuminthedenominatorisoverthevaluesofthequeryvariable Z ,andtheoptimizationisoverthe distributionsfromthecompleteextensionsuchthatP(Y=^y)>0.

The mostpopulartype ofspeciﬁcationforcredalnetworksemploysﬁnitelygeneratedlocalcredalsets;forthesecases wecanfurtherreducethesearchspaceinExpression(6)⁶:

Theorem4.Supposeacredalnetworkisseparatelyspeciﬁedwithclosedconvexlocalcredalsets.ThenthesolutiontoEquation(6)is attainedatsomeextremepointPofthestrongextensionsuchthatP(Y=^y)>0;henceitisattainedatsomecombinationofextreme pointsofthelocalcredalsetsK(X|^Pa(X)=

π

).

Solving theoptimizationinEquation(6) presentstwochallenges.First,thelarge(exponential)numberofcombinations oflocalextremepoints,assumingonehasalreadyobtainedthese.Second,thelarge(exponential)numbersoftermsinthe suminthedenominationandnumerators.Thereareessentiallytwoapproachestotheproblem.

One approachis tocasttheproblemasa combinatorialsearch overtheextremedistributions ofthe localcredalsets.

Thisapproachassumesavertex-basedrepresentationofcredalsets.Typicallythesesearchschemesreproduce,atsomelevel, themessage-passingschemesthatsucceedwithBayesiannetworks,withtheconditionthatnowmessageshavetobe set- valued(perhapssetsofrealnumbers,perhapssetsoffunctions).Messagepassing schemesreceivedattentioninparticular during thenineties.Amongtheinsightsdevelopedinthatinvestigation,wemustmentionthe2Ualgorithm,acleverexact algorithmthatsolvesaparticularclassofinferenceproblemsinpolynomialtime.Manyotherideasincombinatorialsearch havebeenexploredsincethen.Notableexamplesofthisapproacharetheinteger-programmingformulationofdeCampos andCozman[54],thealgebraicsystemofMauáetal.[55],thelocalsearchmethodsofCanoetal.[56].

The other approach is to cast the problem as a continuous optimization, and to adapt known techniques from the numerical optimizationliterature forthe resulting(diﬃcult) optimizationproblem. Notable methodsin thiscategory are themultilinearprogrammingreformulationofdeCamposandCozman[57],andthelinearprogrammingapproximationof Antonuccietal.[58].Animportantfeatureofthisapproachistheabilitytonaturallyhandleboththeconstraint-basedand vertex-basedrepresentationsofcredalsets.

We next presentsomeofthe mostgeneralandeffectiveapproachesto marginalinference, emphasizingtheonesthat havedemonstratedleading performanceintheliterature.Notethatwe assumeseparately specifiednetworkswithfinitely generatedlocalcredalsets(thatis,eachsetistheconvexhulloffinitelymanydistributions),unlessexplicitlyindicated.

5.1.1. Message-passing:basicideasandthe2Ualgorithm

One (intractable)simpleideaistogenerateall possibleextremepoints ofthestrongextension bycombiningthelocal extremepoints, andforeachone runstandard messagepassingalgorithms ofBayesiannetwork. Aslightlymoreeﬃcient approachistoexploittheredundanciesamongthesesharedBayesiannetworkinferences, andtopropagatemessagesthat insteadofcontainingasinglefunction,containsetsoffunctionsthatcorrespondto(someof)theextremepoints.

We havealreadymentionedsomeimportantlandmarks withinthisapproachinSection 2.Forinstance,thewholeex- haustivemessagepassingschemeiscapturedbythealgorithmofCanoetal.[15],whileapproximateinferenceisobtained byTessem’salgorithm,whichpropagatesonlyboundsforeachset[17].

There is a very distinguished casewhere message propagationcan be performedboth eﬃciently andaccurately: this happens when the DAG is a polytreeand all variables are binary. The well known 2U algorithmexploits thispocket of

6 TheproofofTheorem4,presentedintheappendix,alsoshowsthatregularconditioningonaﬁnitelygeneratedcredalsetproducesaﬁnitelygenerated conditionalcredalset;ifonestartswiththelatterfact,thenTheorem4isadirectcorollary.

(9)

tractability [16];its main ideas canbe explainedwithrelative ease.⁷ Abriefdescription ofPearl’spropagationalgorithm forinference withpolytreesisneededfirst[2].SaywehaveaBayesiannetworkwhoseunderlyinggraphisapolytreeand someevidenceY=^y.^To^facilitate^the^notation,âssume^that^theêvidence^setsônly^the^valuesôf^leavesôf^the^graph^withâ singleparent.Thiscanbe alwaysmadetrue:if X isavariablesetbyevidencethatdoesnotsatisfy thoseconditions,then insertafresh“dummy”node X withonly X asparentandsuchthatP(X =¹|^X=^x)=^{1 and}P(X =¹|^X=^x)=^0,^where xistheoriginalvaluesetforX bytheevidence.Thensetevidence X =^{1 (and}^remove ^X ^from^theêvidence^set).Ône^can checkthatanymarginalinferencehasitsvalueunchangedbythistransformation.

InPearl’salgorithmonefocusesononevariable X atatime, andpassesmessagesto itsneighborsasfollows.Assume that X hasparents U1,. . . ,Um andchildrenY1,. . . ,Yn,andnoevidenceissetforX.

• ^From ^X ^to^each^child^Yj,sendthemessage:

f_X_→_Y_j

(

x

) =

k=^j

f_Y_k_→_X

(

x

)

π

P(

X

=

^x

|

^Pa

(

X

) = π )

m i=¹

f_U_i_→_X

(

u_i

) ,

where

π

rangesoverthevaluesofPa(X)andu_i istheprojectionof

π

onU_i∈^Pa(X).

• ^From ^X ^to^each^parent^Ui,sendthemessage:

fX→^Ui

(

ui

) =

x

m j=1

fY_j→^X

(

x

)

π∼^ui

P(

X

=

x

|

Pa

(

X

) = π )

k=i

fU_k→^X

(

uk

) ,

where

π

∼^uiaretheconﬁgurationswhoseprojectiononUiisui,andukistheprojection

π

onUk.

Intheexpressionsabove,theproductsareidenticaltooneifthecorrespondingrangeisempty(e.g.ifX isarootthenthere arenomessages fUi→^X).Soforexample,if X isaleafwithaparentU andnoevidenceissetforX,then fX→^U=^1.

Considernowthecaseofanode X whichhasvalue xfixedbytheevidence.Recallthatweassumedthatevidencewas setonlyforleafnodeswithasingleparent.Then X sendsmessage fX→Û(u)=P(X=^x|Û=û)toitssingleparentU.

Since the graph is a polytree, it is always possible to ﬁnda node that can send a message to a neighbor based on themessagesit hasreceived.Forexample,if X isa leafwitha singleparent U thenits message f_X_→_U doesnotdepend on any other message. Another example is the case of a root node X witha single child Y, which can send message

fX→^Y(x)=P(X=^x).

Afterallmessagesaresent,foreachnode X inthenetworkwehave:

P(

X

=

^x

|

^Y

=

^y

) ∝

m j=1

fY_j→^X

(

x

)

π

P(

X

=

^x

|

^Pa

(

X

) = π )

n i=1

fU_i→^X

(

ui

) .

The message-passingalgorithm above canbe extended forcredalnetworksby considering thewhole set ofmessages f_A_→_B thatgothroughanyedge A→^B ^as^we^vary^thedistributionsinthecompleteextension.Thatfactbyitselfdoesnot rendertheproblemeasy.Akeyinsightbehindthe2Ualgorithmisthatthesesetsofmessagescanbecomputedeﬃciently withaverysimilarmessagepassingschemeforthecaseofbinaryvariables.The2Ualgorithmoperatesasfollows.

• ^From ^X ^to^each^child^Yj,sendthemessage:

fX→Yj

(

x

) =

1

+ (

1

− μ (

x

)) μ (

x

)

1

k=^jf

Yk→X

(

x

)

−1

,

where

μ (

x

) =

^min

π

P (

X

=

^x

|

^Pa

(

X

) = π )

m i=¹

f_U_i_→_X

(

u_i

) ,

withtheminimizationbeingperformedovertheextremepointsoftheintervals[^f_U

i→X(u_i),f_U_i_→_X(u_i)]^.^Upper^bounds areobtainedbyreplacingminimization/lowerboundswithmaximizations/upperboundsintheequationsabove.

• ^From ^X ^to^each^parent^Ui,sendthemessage:

fX→^Ui

(

u_i

) =

^min

λ(

x

) ρ (

x

|

ui

) + λ(¯

x

)(

1

− ρ (

x

|

ui

)) λ(

x

) ρ (

x

|¯

^ui

) + λ(¯

^x

)(

1

− ρ (

x

|¯

^ui

)) ,

7 Thefactthatthe2Ualgorithmcanbederivedasamessagepassingalgorithmhasbeen mentionedbefore[59],buttherelevantdetailshavenot appearedintheliterature.

(10)

where

ρ (

x

|

^ui

) =

π∼ui

P (

X

=

^x

|

^Pa

(

X

) = π )

k=i

f_U_k_→_X

(

u_k

)

andtheminimizationsareperformedover

λ(

x

) ∈

⎧ ⎨

⎩

n j=¹

fY_k→X

(

x

),

n j=¹

f_Y_k_→_X

(¯

x

)

⎫ ⎬

⎭ , P (

X

=

^x

|

^Pa

(

X

) = π ) ∈

^ext

K(

X

|

^Pa

(

X

) = π ),

f_U_k_→_X

(

u_k

) ∈

fU_k→X

(

u_k

),

1

−

^f_U

k→X

(

u

¯

_k

)

.

AsinPearl’salgorithm,theproductsintheaboveexpressionsarereplacedwithvalue1whentheirrangesareempty.

If X isaleafnodewithevidence X=^x,^then^it^sends^message fX→U

(

u

) =

^min

P∈extK(X|U=u)

P (

X

=

^x

|

^U

=

^u

) P (

X

=

^x

|

^U

= ¯

^u

)

toitssingleparentU.Thisisequivalenttosettingλ(x)=^{1 and}λ(¯x)=^{0 in}^theêquation^forâ^node ^X ^with^noêvidence.

AttheendonecancomputeexactboundsonP(X=^x|^Y=^y)=(1+(1/

μ

(x)−¹)λ(¯^x)/λ(x))⁻¹ foreachvariable X inthe networkbyoptimizingovervariables

μ

(x)andλ(x)asintheequationsabove.

The expressions above assume that probabilities are always positive. Regular conditioning with zero probabilities is obtained by extendingthe domainofthe messages toinclude ∞^, ând^by êxtending^sums ând^products accordingly(e.g.

using1/0= ∞^,¹/∞=^0,¹+ ∞= ∞^).^This^can^be^done^without^hurtingcomputationalperformance[16].

The 2U algorithm has inspired a few other approximate methods. The Loopy 2U (L2U) algorithm focuses on credal networks withbinary variables wherethe underlying graphmayfail to be a polytree:in thiscaseone can always keep iteratingtheinterval-valuedmessages,hopingtoreachconvergence[60].Suchaschemecorrespondstobeliefpropagation for Bayesian networks, wherePearl’s propagation isrun until messages converge oruntil some prescribed time limit is reached,anideathatmayseemfar-fetchedbutthatoftenobtainsreasonableaccuracy[61].Thisapproachhasbeenfurther extendedintheGeneralizedL2U(GL2U)forcredalnetworksthatmaycontainnon-binaryvariables;theideaistoturnnon- binary variables intoconnectedclustersofbinary variables thatare then processedby L2U[62].A differentpossibilityis to“cut”edges soastoproduceapproximatecredalnetworkswhoseunderlyinggraphsarepolytreesthatcanbeprocessed by2U;theIPEalgorithmexploresthispath[60].Theseapproximateschemesresembleinseveralwaysvariationalmethods, andindeedsomepreliminaryeffortshavebeenmadetoapplyvariationalmethodstocredalnetworks[60].

5.2. Message-passing:algebraicvariableelimination

As we discuss later in Section 6,the 2U algorithm is a truly isolated island of tractability for inference with strong extensions. To handleother situations one may resortfor instance tostochastic search techniques.Cano andMoral [63]

employed geneticprogrammingtoﬁndapproximatesolutions.Cano etal.[64] implementedalocalsearchwithsimulated annealing in order to avoid poor local optima. These approximate methods were later used in conjunction with outer approximationalgorithmstodevelopabranch-and-boundalgorithm [56].

A moresophisticatedapproach,basedonthemanipulation ofsetsofpairs offunctions,was developedbyMauáetal.

[55].Theapproachisabletoproducebothexactandapproximatesolutions,andhasbeenshowntooutperformtheinteger programmingformulation(describedlater)ona largecollectionofsyntheticproblems.Wefocuson thisalgorithm inthe remainderofthissubsection;beforedescribingit,wemustreviewtheoperationsbehindvariableeliminationinthecontext ofBayesiannetworks.

Inshort,variableeliminationextendsPearl’smessagepassingalgorithmtonetworksofarbitrarytopology.⁸Solet(C,T) beatree-decompositionfortheDAGofaBayesiannetworkandR⊂^C ^be^such^that ^Z∈^R^.Âssume^with^no^lossôf^generality that R= {^Z}^(otherwise^we^canâdd â^new^node^to ^T ^satisfying^such^propertyând^connect ît^to ^R ând^to^noôther^node).

Root T at R,orientingthearcsaway from R.Thevariableeliminationprocedure inAlgorithm1computesP(Z=^z|^Y=^y) foragivenrootedtreedecomposition(C,T)andsetofconditionalprobabilitymassfunctions0= {^p(X|^Pa(X)):^X∈^X}^.⁹ Thefunctionsδy(Y)inthealgorithmdenotetheKroneckerdeltafunctionthatreturnsoneatY=^y ând^zeroêlsewhere.^The algorithm startsby creatingandinserting thesefunctionsinto0.Thissomewhat uncommonstepisonlya convenience, asit allows ustodispense withtheneed ofeitherfixing thevaluesof evidencesinthe initialdistributions in 0, orto treat evidence variables separately.The second “for loop” ofthe algorithm is theelimination step:at each iteration, the

8 Thevariableeliminationwepresentdiffersmarginallyfromotheralgorithmssuchasbucketelimination [65],junctiontreepropagation [66],Shenoy- Shafferalgorithm [67],andthecollectalgorithm [68].

9 Notethatp(X|^Pa(X))representsafunctionoverthevaluesofXandPa(X).