Probabilistic Logic with Independence

(1)

Probabilistic Logic with Independence

Fabio Gagliardi Cozman

^a

, Cassio Polpo de Campos

^b

, Jos´e Carlos Ferreira da Rocha

^c

aEscola Politécnica, Univ. de São Paulo, São Paulo, SP, Brazil

bEscola de Artes, Ciências e Humanidades, USP Leste Univ. de São Paulo, São Paulo, SP, Brazil

cUniversidade Estadual de Ponta Grossa, Ponta Grossa, PR, Brazil

Abstract

This paper investigates probabilistic logics endowed with independence relations.

We review propositional probabilistic languages without independence, and then consider graph-theoretic representations for propositional probabilistic logic with independence; complexity is analyzed, algorithms are derived, and examples are discussed. Finally, we examine a restricted first order probabilistic logic that gener- alizes relational Bayesian networks.

Key words: Probabilistic Logic, Graph-theoretic Models, Sets of Probability Distributions, Linear and Multilinear Programming

1 Introduction

In this paper we discuss formalisms that combine probabilistic and logical reasoning. This promising mixture has old roots and has been rediscovered a few times. Pioneering efforts by Boole [3] and by de Finetti [19] were later generalized [28] and then surfaced in the artificial intelligence literature [49].

First order logic in particular have received contributions from several fields [33], ranging from philosophy of science to knowledge representation [2,29,49].

We can divide current probabilistic logics in two groups. In one group, probability assessments are rather flexible and independence relations have a sub-

Email addresses: [email protected](Fabio Gagliardi Cozman),

[email protected](Cassio Polpo de Campos), [email protected](Jos´e Carlos Ferreira da Rocha).

(2)

sidiary status. In the second group, probabilistic assessments are required to specify a unique distribution and independence is an essential concept. In this paper we explore probabilistic logics where independence is a central character, and where assessments are not tied to uniqueness concerns.

We start with a brief discussion of standard propositional probabilistic logics. We add independence relations to such languages, propose graph-theoretic representations inspired by Bayesian networks, derive new complexity results and present algorithms and examples (Sections 3, 4 and 5). The second language we consider is based on relational Bayesian networks [36]. Such networks combine fragments of first order logic and graphs to obtain a reasonably flexible language. We relax this language by allowing more general universally quantified logical constraints and probabilistic assessments (Section 6).

This paper progresses through a hopefully cogent sequence of languages. Even though we are most interested in languages with enough power to represent practical problems, we start from the most unassuming one, propositional probabilistic logic without independence, and move slowly to more expressive languages. This gradually accelerating pace is deliberate, as we hope to present convincing arguments for our basic “design” choices.

2 Satisfiability, inferential vacuousness, and independence

Consider first propositional probabilistic logic (an excellent historical and technical review is given by Hansen and Jaumard [30]). We have propositions and the usual Boolean operators (conjunction, disjunction, etc). A truth assign- mentis a vector assigning either true or false to each proposition (assignments are often called possible worlds [49]). Satisfaction of a formula φ by a truth assignment ω is denoted by ω|=φ.

Probabilities are defined over truth assignments, and P(φ) is understood as

P

ω:ω|=φP(w) [24,30]. Our problem is to find whether or not there is a probability distribution satisfying a set of m assessments

P(φi∧ϕi)≥αiP(ϕi), with φi and ϕi formulas in n propositions. (1)

If there is such a distribution, the assessments are satisfiable. Obtaining the minimum/maximum value for a probability P(φ), subject to assessments, is an inference. Note that constraint (1) is similar to P(φi|ϕi) ≥ αi; however, the former is valid regardless of whether P(ϕi) = 0 or not, while the latter is usually left undefined when P(ϕi) = 0. We avoid issues of zero probability in

(3)

essence by assuming that all assessments are unconditional.¹

Example 1 Consider the following hypothetical facts, inspired by Jaeger’s example [35]. (Note: in all examples, assessments are expressed through conditional probabilities, even though they should be understood as Expression (1);

we also use material implications to express constraints later assumed to be in conjunctive normal form.) We have:

AntarcticBird →Bird, FlyingBird →Bird, Penguin →Bird, FlyingBird →Flies, Penguin → ¬Flies, P(FlyingBird|Bird) = 0.95, P(AntarcticBird|Bird) = 0.01, P(Bird)≥0.2, P(FlyingBird∨Penguin|AntarcticBird)≥0.2, P(Flies|Bird)≥0.8.

Using fractional linear programming, we obtain P(Penguin|¬AntarcticBird) ∈ [0.000,0.050] and P(FlyingBird|Bird∧ ¬AntarcticBird)∈[0.949,0.960].

The best exact algorithms for satisfiability are based on the revised simplex algorithm; problems with (n, m) up to (200,1000) have been solved exactly, and many approximations have been proposed [30,31,38]. One might hope that phase transitions would be present in probabilistic satisfiability, much as they are present in SAT problems [43], so that hard problems would be concentrated in narrow regions. However, even a simple experiment reveals that phase transitions, if they exist at all, do not follow here the pattern observed in SAT. Consider Figure 1. The graphs indicate time (wall-clock) to solve random probabilistic satisfiability problems. A problem was generated by fixing the number of propositions n, the ratio r = m/n(m is the number of assessments), and the number of literals per clause k. Each literal in each clause is positive or negative with probability 1/2 [43], and each clause was associated with a randomly selected probability in [0,1]. For each combination ofn,r andk, 30 problems were generated and a revised simplex (with column generation based on auxiliary 0/1 programs) was applied [30]. The rightmost graph deals with problems that are known to be satisfiable (the logical part is satisfiable and the probabilistic assessments were generated from a distribution over propositions). Overall, the pattern of all graphs in Figure 1 suggests that the larger the problem, the harder it is to check its satisfiability.

Computational complexity is not the only reason for concern in probabilistic logic. Another difficulty isinferential vacuousness— the concern that, due to the flexibility of the language, it is too easy to obtain vacuous inferences. A simple example of inferential vacuousness is as follows. Suppose eventsAandB have no logical relation andP(A) = 1/2,P(B) = 1/2. Then bounds onP(A∧B) are rather vacuous: P(A∧B) ∈ [0,1/2]. This imprecision vanishes if A and B are independent; then P(A∧B) = 1/4. In fact, standard statistical models

1 A more elegant strategy would be to take conditional probability as a primitive notion, even allowing conditioning on events of zero probability. Such a theory has been advocated by de Finetti [19], Renyi [56], Popper [54], and many other researchers [5,26,32,55,63]; the definition of conditional independence has been the object of significant recent progress [13,59,60].

(4)

0 5 r 10 15 0

0,5 1 1,5

t

n=40, k=3 n=40, k=4 n=40, k=5

0 5 r 10 15

0 0,5 1 1,5 2 2,5 3

t

n=50, k=3 n=50, k=4 n=50, k=5

0 5 10r 15 20

0 50 100 150

t

n=18, k=3

Fig. 1. Left and middle: time (seconds) to check satisfiability of random problems.

Right: time (seconds) to check satisfiability of satisfiable random problems (note that these problems required more effort).

typically avoid inferential vacuousness by assuming independence relations. By representing independence relations in probabilistic logic, we should likewise obtain compact and modular descriptions.

Hence consider a three-place relation that denotes independence of φ and ξ givenϕ, where φ, ξ andϕare propositional formulas. The relation is denoted by (φ⊥⊥ξ|ϕ). We again form 2ⁿ truth assignments for the n propositions of interest, and take probability measures over this set of truth assignments. An assessment P(φ) ≥ α is interpreted as before, and a relation (φ⊥⊥ξ|ϕ) now asserts independence among sets of truth assignments. Independence ofφand ξ given ϕmeans

P(φ∧ξ|ϕ) =P(φ|ϕ)P(ξ|ϕ) whenever P(ϕ)>0 (2)

for every distribution satisfying assessments.² This nonlinear constraint is equivalent to the following constraint on probabilities of truth assignments:



 X

ω|=φ∧ξ∧ϕ

P(ω)



×



 X

ω|=ϕ

P(ω)



=



 X

ω|=φ∧ϕ

P(ω)



×



 X

ω|=ξ∧ϕ

P(ω)



. (3)

To compute a tight upper bound on P(θ) for a propositional formula θ, we must find max^P_ω|_=θP(ω) subject to assessments and independence relations.

This formulation can easily deal with conditional inferences; that is, with the calculation of maxP(θ|ϑ) subject to assessments and independence relations.

Suppose first that minP(ϑ) > 0. The calculation of maxP(θ|ϑ) requires the minimization of a ratio of linear expressions. Introduce a functionq such that

2 There are other concepts of independence proposed in the literature when more than a distribution satisfies assessments [6,9,12,17,44,61,62]. We could also pursue a different definition, by taking independence to mean P(φ|ξ ∧ϕ) = P(φ|ϕ) and P(ξ|φ∧ϕ) = P(ξ|ϕ) even when P(ξ ∧ϕ) = 0 or P(φ∧ϕ) = 0 [5,59]; as indicated in the previous footnote, in this paper we do not further investigate this arguably more elegant strategy.

(5)

q(ω) =tP(ω) wheretP(ϑ) = 1; to simplify notation, we use q(φ) =^P_ω|_=φq(ω) for any formula φ. We have to find maxtP(θ∧ϑ), or rather

maxq q(θ∧ϑ), subject to: (4)

(1) m linear constraints q(φi∧ϕi)−αiq(ϕi)≥0, (Expression (1));

(2) r multilinear constraints on q(ω) (dividing Expression (3) by t);

(3) constraints q(ω)≥0 for all ω and q(ϑ) = 1;

(4) constraint^Pωq(ω) =t.

This procedure is a direct application of the Charnes-Cooper transformation [8,30]. For the procedure to work, we must first verify (through nonlinear programming) that minP(ϑ)>0. In this paper we leave minP(θ|ϑ) undefined whenever minP(ϑ) is equal to zero.³

Program (4) takes us to nonconvex optimization; more precisely, to problems that can be expressed asmultilinear programs. We have employed Sherali and Adams’ algorithm for multilinear programming in our tests [58], because this algorithm iterates over a sequence of linear programs, and can thus benefit from column generation techniques in large problems [30]. (We should mention a different strategy proposed by Andersen and Hooker [1]: instead of dealing directly with a multilinear program, they build a linear program on 2ⁿ free variables and use Bender’s decomposition to generate cutting planes.)

Example 2 Consider the constraints and assessments in Example 1, and suppose that AntarcticBird and FlyingBird are independent (ability to fly does not depend on origin). Then P(Penguin|¬AntarcticBird) ∈ [0.000,0.050] and P(FlyingBird|Bird∧ ¬AntarcticBird)∈[0.950,0.958].

Independence relations yield powerful constraints that reduce inferential vacuousness. However, “unstructured” independence relations lead to difficult problems (proofs of alllemmas and theorems are in Appendix A):

Theorem 3 (Decision version of Program (4)) Given a Program (4) and a rational number r, deciding whether maxq(θ∧ϑ)≥r is NP^PP-Hard.

3 A different strategy is to compute the minimum among the probability measures such that P(ϑ)>0; that is, min_P_:P(ϑ)>0P(θ|ϑ) (this solution is similar to Walley’s regular extension[62]). Program (4) does yield such a value if run without any checks onP(ϑ). Yet another solution is to allow conditioning on events of zero probability, as suggested in previous footnotes.

(6)

3 Graph-theoretic representations: PPL networks

As noted in the previous section, the flexibility of propositional probabilistic logic comes at a price in computational complexity. Besides, a language that is too unstructured may in fact overwhelm its users. In this section we explore ways to attach “structure” to assessments using graph-theoretic tools.

Consider a set X = {X1, . . . , Xn} of binary variables, each representing a proposition; ˆXi denotes the corresponding literal. A directed acyclic graph G is associated with X: each variable is a node in G. If edge X → Y belongs to G, then X is a parent of Y; parents of Y are denoted by pa(Y). Nodes that can be reached fromX through directed edges are descendantsof X. We assume the graph to be endowed with the following Markov condition, taken literally from the theory of Bayesian networks [7,51,45]: Xi is conditionally independent from its nondescendants nonparents given its parents. This leads to the unique factorization P(X) = ^Q_iP(Xi|pa(Xi)) for every distribution satisfying the Markov condition.

For the purposes of probabilistic logic, it makes sense to separate the graph and its Markov condition from the probabilistic assessments and logical constraints. The directed acyclic graph simply indicates independence relations amongst variables; assessments need not follow the pattern P(Xi|pa(Xi)).

However, the following is required. Every assessment is either a logical formula φj that is asserted to be true, or an assessment P(φj∧ϕj)∈[αjP(ϕj), βjP(ϕj)], whereφj andϕj are propositional formulasin conjunctive normal form (CNF) containing only variables in the graph.

To understand the advantage of this graph-theoretic framework, consider the size of the multilinear program that must be solved for satisfiability and inference. While “unstructured” probabilistic logic requires manipulation of probabilities for 2ⁿ truth assignments, a graph-theoretic model must only deal with the probabilities in the decomposition ^Q_iP(Xi|pa(Xi)).

Example 4 Consider Example 1 and take the graph Gin Figure 2 (one might adopt a different graph; a possible alternative is also shown in Figure 2).

This graph and its Markov condition imply independence relations (due to factorization of the joint distribution); for instance, FlyingBird and Penguin are independent given any conjunction of literals AntarcticBirdˆ and Bird. Mul-ˆ tilinear programming, subject to these independence assumptions and previous assessments, produces P(FlyingBird|Bird∧ ¬AntarcticBird)∈[0.949,0.960]

and P(Penguin|¬AntarcticBird) = 0. Note: the conditional independence of FlyingBirdandPenguinstated in the network may seem at first to clash with the logical dependence between these propositions (we must have ¬(FlyingBird∧ Penguin)). But the network is satisfiable: eitherP(FlyingBird|AntarcticBird) = 0

(7)

FB

AB

P

F

B

?-

@@R

FB

AB

P

F

B

?

-

@@R

)

FB

AB

P

F

B

@@

Fig. 2. From left to right: graph G in Example 4; an alternative graph; constraint network for the first graph (G). Nodes indicate only capital letters of variable names.

or P(Penguin|AntarcticBird) = 0 or both. If these conclusions seem inade- quate, then an edge must be included between FlyingBird and Penguin, thus removing multilinear constraints. Similarly, if we had assessments such that P(FlyingBird|AntarcticBird)>0 and P(Penguin|AntarcticBird)>0, the network would be unsatisfiable and would have to be modified. So a PPL network is not a finished collection of truths; it is a tool that lets assessments and constraints be critically evaluated through inference.

In short, we have the following model, where assessments follow the pattern of Expression (1).

Definition 5 A Propositional Probabilistic Logic (PPL) network consists of a triple (G,L,A), where G is a directed acyclic graph with n nodes, each one identified with a variable; L is a list of formulas; A is a list of assessments αjP(ϕj) ≤P(φj ∧ϕj)≤βjP(ϕj), where φj and ϕj are formulas; all formulas are in CNF containing variables in G.

We will always assume the Markov condition to be imposed on a PPL network. As for the logical database L, it may be profitably specified through a constraint network, as in Dechter and Mateescu’s mixed networks[23]. Figure 2 shows the constraint network for Example 4. An interesting observation is that if a constraint network actually forms astructured databasein Darwiche’s sense [14], then logical constraints in the graph are such that d-separation implies “entailment separation” (satisfiability can be checked in a sub-graph after discarding nodes that are d-separated). We leave for future work the promising study of PPL networks where the graphG encodes both strong independences and Darwiche’s “entailment independence.”

In a PPL network assessments may be (i) unsatisfiable; (ii) satisfiable by a single distribution; (iii) satisfiable by a set of distributions. In the remainder of this section we discuss a few formalisms that stay closer to a Bayesian perspective in that the only possible case for them is (ii).

The first formalism we discuss is Dechter and Mateescu’s mixed networks, as a solid representative of sizeable literature [4,22,40,51]. A mixed network consists of a Bayesian network over X and a list of logical constraints whose

(8)

set of valid truth assignments is denoted by Γ. The network encodes:

P(X=x) =





 Q

iP(Xi =xi|pa(X_i) =πi∧x∈Γ) ifx∈Γ,

0 otherwise. (5)

Now, a mixed network is not a tool that combines logical and probabilistic data with equal status; what we have is a Bayesian network and an elaborate description of a conditioning event Γ. Suppose that, in an attempt to turn mixed networks into a real mix of logic and probability, we replace conditioning on Γ by the constraint that P(Γ) = 1. Then either P(Γ) is already one in the Bayesian network or not; in the first case the constraint network is useless, and in the second case the mixed network is unsatisfiable. As soon as a complete Bayesian network is specified, there is little beyond (5) that can be done; althouth conditioning on a complex event Γ can always occur in a PPL network, a PPL network allows stronger statements of logical certainty.

Perhaps what should be stressed is this: an inference conditional on event Γ is notthe same as an inference under the logical constraint that Γ obtains.

Example 6 Consider two variables X and Y, a directed acyclic graph X → Y, and the constraint X ∨ ¬Y. Suppose we have P(X = 1) ≤ 1/2, P(Y = 1|X = 1)≥1/2, P(Y = 1|X = 0)≤1/2. Then maxP(Y = 1|X∨ ¬Y) = 2/3.

However, maxP(Y = 1) = 1/2 when subject to probabilistic assessments and to X ∨ ¬Y being true. (Note: if we had a Bayesian network X → Y with P(X = 1) = P(Y = 1|X = 1) =P(Y = 1|X = 0) = 1/2, then P(X ∨ ¬Y) = 3/4, inconsistent with logical constraint X∨ ¬Y.)

Different formalisms would obtain if we were to use undirected graphs in probabilistic logic. Suppose G is an undirected graph subject to the usual Markov condition (separation in the graph implies independence) [7]. This condition fails to guarantee factorization of joint distributions when zero/one probabilities are present. Alas, this is exactly the case of interest in probabilistic logic.

Thus we would again need to deal with 2ⁿ truth assignments, without guar- antees of factorization. Nothing would be gained if we adopted chain graphs [7]; the same failure of factorization may happen in the presence of logical constraints.⁴

One possibility would be to adopt undirected graphs and to postulate a factorization, rather than to obtain the decomposition from a Markov condition.

This is the strategy in Richardson and Domingos’Markov logic, where a weight wi is attached to each formula φi and the joint distribution is proportional to exp(^Pwiφi) [57]. This distribution is assumed even when infinite weights are

4 We have entertained the use of chain graphs in probabilistic logic [11]; however, we have concluded this not to be feasible, and reverted to a previous proposal [16].

(9)

used to create “hard” constraints, a move that may itself break the justifica- tion for factorization. Markov logic is also somewhat surprising in that one can produce precise probabilities from any number of assessments. Such extraor- dinary reasoning differs from the usual scheme in probabilistic logic, where an inference should informs us about all that can be said with the given data, and no more. Even though Markov logic is an ingenious device, we clearly follow a different route with PPL networks.

4 The complexity of PPL networks

In this section we present results on the complexity of PPL networks. We will use in this section results from the theory of strong extensions of credal networks [10]. A credal network consists of a directed acyclic graph where each node is associated with a variableXi and with constraints on conditional distributions; these constraints define sets of probability measures (such sets are referred to as credal sets). The usual Markov condition then creates the network’sstrong extension: a set of Bayesian networks sharing the same graph.

We also need a more formal statement of our decision problem. Given a PPL network, an inference computes the lower (or upper) probability P(φ|ϕ) (or P(φ|ϕ)) for formula φ given the formula ϕ. The decision problem is:

Definition 7 Given a PPL network N = (G,L,A), PPL-inference is the problem of deciding whether there is a probability distribution d (for the node set of G) satisfying all constraints inL and A.

Let N = (G,L,A) be a PPL network. The auxiliary Boolean credal network N⁰ = (G⁰,K⁰) ofN is produced as follows:

• G⁰ ⊇ G and has a new node Cij for each non-unitary clause appearing in a formulaφj of A(both for conditioned and conditioning formulas). EachCij

is connected with new arcs from the variables it contains.

• Constraints of A are inserted in K⁰ as follows: A ∈ A is inserted into K⁰ iff A defines an interval of probabilities for a single variable of G⁰ given a conjunction of its parents, that is, A ≡ αjP(ϕj) ≤ P(φj ∧ϕj) ≤ βjP(ϕj), whereφj is a single literal ˆX and ϕj is a conjunction ofX’s parents in G.

• CPTs encoding the truth tables of new Cij clauses are inserted into K⁰ associating these nodes and their parents.

An auxiliary network can be formed in polynomial time: it keeps the nodes of the original PPL networkN and adds a node per non-unitary clause inA. As clauses are disjunctions of literals, their truth tables are easy to encode (zero only when all literals are negated and oneelsewhere). An important quantity is thetreewidth of the auxiliary network, because constraints in A may imply

(10)

new relations between variables that are expressed by new Cij nodes in the auxiliary network (thetreewidth of a network is thetreewidth of its underlying graph). If each clause appearing inAis restricted to a variable and its parents, then all constraints in A may be encoded in the local credal sets K⁰ of the auxiliary network, and the treewidth of the PPL network is the same as the treewidth of its auxiliary network.

Lemma 8 The treewidth of a PPL networkN = (G,L,A)is equal to the treewidth of its auxiliary network N⁰ = (G⁰,K⁰) if all clauses in all constraints of A are restricted to variables of N and their parents in G.

We classify a PPL network either as bounded treewidth ormulti-connected:

Definition 9 A PPL networkN has bounded treewidth (BTW) if its auxiliary network has treewidth smaller thanO(f(S)), where f(S)is a logarithmic function onS, the input size needed to specify N; otherwise N is multi-connected.

Logical constraints and assessments can be tested in time similar to Bayesian network inference:

Lemma 10 Given a BTW (respectively multi-connected) PPL network N = (G,L,A) and a constraint φj ∈ L, deciding whether a given probability distribution d (for the node set of G) satisfies φj is in P (respectively in PP).

Lemma 11 Given a BTW (respectively multi-connected) PPL network N = (G,L,A) and a assessment Aj ∈ A, deciding whether a given probability distribution d (for the node set of G) satisfies Aj is in P (respectively in PP).

Now, our main theorem:

Theorem 12 PPL-inference is NP-Complete for BTW PPL networks and NPPP-Complete for multi-connected PPL networks.

The following consequence of our results deserves attention. Note that complexity of inferences depends on the treewidth of the auxiliary network (proof of Theorem 12), and this treewidth does not depend on the number/size of constraints in L (by construction). Thus, purely logical constraints do not increase complexity of inferences, regardless of the treewidth of their corresponding constraint network. However, note that this observation is valid only for the logical constraints inL, not for probabilistic assessments involving formulas in A (as they may increase the treewidth of the auxiliary network).

Another observation is this. The CNF-SAT problem (satisfiability restricted to CNF formulas) is already NP-Complete; thus if we include probabilistic assessments into a CNF-SAT problem using a BTW credal network, our results

(11)

show that complexity does not increase.

5 A framework for inferences in PPL networks

One can obtain lower/upper probabilities in a PPL network by multilinear programming. A “naive” solution is to write down linear and multilinear constraints for the 2ⁿ truth assignments for variables in the network. In this section we propose a better strategy to compute inferences, where we exploit the factorization ^Q_iP(Xi|pa(Xi)) expressed by a PPL network. Instead of fo- cusing on a single algorithm, we present a framework that can be combined with, and benefit from, any Bayesian network inference algorithm.

The basic idea is to run a Bayesian network inference algorithm to produce a compact description of a multilinear program — a program where the probability values P(Xi|pa(Xi)) in the auxiliary network appear as optimization variables. This idea is borrowed from the literature of credal networks [15].

At first, the auxiliary network of the given PPL network is created. This network is in fact a Boolean credal network that encodes the dependence structure of the original model. A multilinear program is generated where the optimization variables are probabilitiesP(Xi|pa(Xi)) of the auxiliary network.

The challenge is how to handle probabilities over formulas. To do so, each constraint of L is treated as a query in N⁰. These queries are in fact joint queries inN⁰ (see proof of Lemma 10 for details). Each assessment ofAeither was inserted into N⁰ (during its construction) as a constraint in local credal sets, or it is also treated as a query in N⁰. Again, these queries are joint queries in N⁰, because each Aj ∈ A deals with clauses that have become nodes in the construction of the auxiliary network (see proof of Lemma 11 for details). At this stage, we have a Boolean credal network N⁰ and a set of queries. The only but important difference from this setting to a simple belief updating in a credal network is that we must simultaneously satisfy all queries.

But we still have a multilinear program: we have only joint queries, and each such joint query is a multilinear function ^P_X\X_Q^Q_iP(Xi|pa(Xi)) over the optimization variables and restricted to some rational interval. We can choose among several algorithms for Bayesian network inference to symbolically factor the summation into smaller multilinear constraints.

A particularly simple algorithm is variable elimination [21,64]. The purpose of variable elimination is to efficiently compute a summation of products such as ^P_X^Q_iP(Xi|pa(Xi)). In our setting, we must write down symbolically the factorization obtained by variable elimination. Every time a summation can be interchanged with a multiplication, we introduce new optimization variables to simplify the expression (similarly to the algorithm in [15]). Applying

(12)

this idea several times, the initial multilinear function is transformed into a set of smaller multilinear constraints. The drawback is that we introduce new artificial optimization variables that will be part of the final multilinear programming problem.

Thus, for each query, we run a symbolic version of our preferred algorithm for inference, which outputs a set of multilinear constraints. A multilinear program is obtained by taking all multilinear constraints plus the constraints already defined in N⁰ by the credal sets. Using this multilinear program, we can maximize/minimize P(φ|ϕ) subject to all assessments and constraints.

This method can be used with a variety of inference algorithms for Bayesian networks; in particular, it can be used with algorithms that exploit the presence of determinism [4,23]. Another obvious path is to explore approximation algorithms, because exact techniques are certainly limited in the size of solv- able problems [34]. We leave such an investigations for the future.

6 Moving to first order: Relational networks

In this section we move beyond propositional languages, briefly looking into elements of first order logic. To fix terminology: in first order logic aformulais a combination ofconstants,relations,functions, Boolean operators,quantifiers

∃and∀, andlogical variables(not to be confused with the “random” variables Xi we associate with nodes in networks). A sentence is a formula without free logical variables. The semantics is established using a domain (a set of individuals). An interpretation assigns individuals to constants, relations and functions in the domain respectively to constants, relations and functions in the vocabulary. More details can be found in several textbooks [46].

Nilsson’s influential work on first order probabilistic logic [49] took assessments similar to Expression (1) with φ and ϕ first order sentences. Nilsson’s logic interprets probabilities as measures over sets of interpretations. More general languages [2,29] attach probabilities also to domains, and allow quantifiers to appear “outside” of probabilistic assessments. Several logics [27,42,50] and applications (for example, in “probabilistic” logic programming [39,41,47]) have been studied in the last twenty years. Most such work, typically referred to as “first order probabilistic logic,” shares some characteristics. First, it allows general assessments (interval-valued, set-valued, qualitative ones). Second, it does not exploit independence relations, even when independence can be expressed in the language. Inferential vacuousness is a concern, and complexity is typically very high.

A different strategy has received attention recently. Here graph-theoretic mod-

(13)

els, such as Bayesian networks or Markov networks, are enhanced with elements of first order logic. Usually the language is relational, the domain is finite, and assumptions of unique names are made to guarantee finiteness of propositionalized models. Often the resulting languages are called “probabilistic relational models” [25,36,48,52,57]. We obtain models that are compact and efficient; that guarantee satisfiability and existence of a unique probability distribution; and that deal explicitly with independence. The drawback is that languages are restricted in the assessments they allow, and logical constraints must be enforced by zeros and ones in distributions.

We wish to focus on probabilistic logics with a balanced mix of point/interval/

set-valued assessments and logical constraints,andwith efficient graph-theoretic representations based on independence relations. We cannot hope to address, in this section, first order probabilistic logics in their full generality; instead we present a restricted language with some attractive features. The basic idea is to extend PPL networks to relational settings using concepts in Jaeger’s relational Bayesian networks [36,37].

Relational Bayesian networks assume a vocabulary S of relations and a finite domain where each individual has an exclusive name. Each node of a directed acyclic graph is associated with a relation. A relationris then associated with a probability expression Fr that produces the probability of r(v) for any v in a domain D. The probability formula Fr may depend on other relations, the parents ofr in the graph. If a free variable appears in the parents ofrbut not inr, then probabilities must be fused in some way. For example, suppose that relation r(v) has parent relations(u, v); then the probability p(r(v)|{s(u, v)}) must be specified for arbitrary sets of individuals u. Relational Bayesian networks employ combination functions, denoted by M{F₁, . . . , Fn|u;c(u, v)}, to combine Fi for all u that satisfy the equality constraint c(u, v). Examples of such functions are Noisy-OR, Max, Min, and Mean. A relational Bayesian network is perhaps best viewed as a first order “template” for Bayesian networks:

given a domain, a relational Bayesian network yields a “propositionalized”

Bayesian network that can be used for inference [37].

All of this can be directly turned into a powerful (but restricted) first order probabilistic logic, using ideas already tested in PPL networks.

Definition 13 A Restricted Relational Probabilistic Logic (RRPL) network consists of a tuple (D,S,G,L,A), where D is a finite set of individuals, S is a vocabulary with constants and relations, G is a directed acyclic graph with n nodes, each one identified with a relation ri in S; L is a list of sentences in S, restricted to conjunctions of disjunctions of relations, preceded only by universal quantifiers;A is a list of assessmentsαjP(ϕj)≤P(φj∧ϕj)≤βjP(ϕj), where φj and ϕj are formulas without quantifiers in S such that all logical variables are bound to universal quantifies “outside” of inequalities.

(14)

FB(x)

AB(x)

P(x)

F(x)

B(x)

SS(x, y)

? -

@@R

FB(T)

AB(T)

P(T)

F(T)

B(T)

SS(T, O)

? -

@@R

FB(O)

AB(O)

P(O)

F(O)

B(O)

? -

@@R

HHHj

Fig. 3. Left: graph G in Example 14. Right: propositionalized PPL network in Ex- ample 14. Nodes indicate only capital letters of relations and individuals.

We assume throughout the corresponding Markov condition, directly from the theory of relational Bayesian networks [36]: an instantiation of a relation is independent of instantiations of its nondescendants nonparents given instantiations of its parents. Note the restriction on the form of first order sentences:

basically, they must be universal sentences in conjunctive normal form. It is possible to relax the restrictions on assessments by allowing expressions of the generality considered by Jaeger [36]; we have simplified the definition due to lack of space.

Similarly to relational Bayesian networks, a RRPL network is a template for PPL networks: given a finite domain, we can produce a PPL network for inferences.

Example 14 Consider Example 4. The example is perhaps more accurately viewed as an ontology expressing facts about individuals in some domain. Sup- pose a RRPL network is defined by the graph in Figure 3, and sentences and assessments: ∀x : AntarcticBird(x) →Bird(x), ∀x: FlyingBird(x) →Bird(x),

∀x: Penguin(x) →Bird(x), ∀x: FlyingBird(x) →Flies(x), ∀x: Penguin(x) →

AntarcticBird(y)) ≥ 0.7. This last assessment ties together individuals in the domain (note: the assessment is not required to “follow” the direction of edges; we could have assessed a different expression for the probability of SameSpecies). Now suppose we have two individuals, Tweety and Opus. The propositionalized PPL network for this domain is presented in Figure 3. Using the framework in Section 5 with the variable elimination algorithm, we obtain P(SameSpecies(Tweety,Opus)|Penguin(Tweety)∧AntarcticBird(Opus))∈ [0.7,1.0]. Note that this probability is “inherited” from the conditioning on AntarcticBird(x)∧AntarcticBird(y)in the last assessment (asP(¬AntarcticBird(x)∧

Penguin(x)) = 0 for all x).

In Example 14 we have point-valued and interval-valued assessments; we could also have qualitative and ordinal assessments. More importantly in the context of RRPL networks, we need not adopt any artificially “standardized” way to combine expressions (such as the Noisy-OR function).

Example 15 Consider assessments, constraints, and graphG in Example 14.

(15)

Suppose another relation, Sick, is introduced to represent illnesses spreading amongst members of the same species. Suppose the node for Sick is added to graph G as a child of SameSpecies and Flies, and the following probability expression is available: ∀x, y : P(Sick(x)|Flies(y)∧SameSpecies(x, y)) = 0.9.

This assessment can be taken literally: as yranges over the domain, we obtain a number of separate constraints and there is no need to combine them through Noisy-OR — there is no obstacle to us using Noisy-OR or any other function if we judge that to be reasonable, but a combination is not mandated by the language.

We have discussed inference in RRPL networks through propositionalization;

the relevant complexity results and algorithms are inherited from the theory of PPL networks, presented in previous sections. An alternative and most promising idea is to conduct lifted inference; that is, to do calculations, to the extend that it is possible, with parameterized probabilities and first order formulas [20,53]. Several algorithms for lifted inference in relational Bayesian networks may be subject to the framework described in Section 5: lifted inference is run “symbolically” (perhaps a few times), thus generating a multilinear program that is subsequently solved. We leave for the future an investigation of the practical feasibility of this idea.

7 Conclusion

Probabilistic logics must accommodate independence relations, both to reduce inferential vacuousness and to structure assessments. We have introduced graph-theoretic models for probabilistic logics with independence, using elements of Bayesian network theory; we then investigated their complexity and presented inference algorithms. We have tried to stay close to one of the main tenets of probabilistic logic: constraints and assessments hold equal status, and verifying their consistency is an important task.

The networks we propose in this paper try to balance the flexibility of tradi- tional probabilistic logic and the efficiency of probabilistic relational models.

In fact, while the former is “too loose” and leads to computational challenges and inferential vacuousness, the latter are “too strict” in what they assume and demand. Our proposals accept that some structure is necessary in prac- tice, thus adopting graph-theoretic tools; however our proposals do not assume as many independence relations and structural conditions as needed to obtain a single probability distribution. In fact, once we move to sets of distributions, we can also handle many other types of assessments, such as qualitative, interval-valued, set-valued, and ordinal ones [16].

Two final points should be emphasized. First, our results show that complexity

(16)

of inferences grows basically as the treewidth of assessments in A, and not with the treewidth of a constraint network representing L. The second point to emphasize is the unifying character of credal networks in the framework we have built.

We have explicitly left several topics for future work: an investigation of phase transitions, a proper account of conditioning on events of zero probability, a study of structured databases, a study of approximate algorithms and algorithms that exploit logical constraints, a derivation of algorithms that handle first order sentences efficiently. It is perhaps unrewarding to list all possible future work on probabilistic logic with independence — one might investigate more expressive languages, applications, connections to other fields. Clearly this section closes just the beginning of a longer journey.

Acknowledgements

The work has received partial support from CNPq (grant 3000183/98-4) and FAPESP (grant 04/09568-0). We acknowledge partial support from HP Brazil R&D. We thank Barbara Vantaggi for advice on the topic of zero probabilities, and Thomas Lukasiewicz for important technical help and for encouraging us to produce the paper. We thank Paulo S´ergio Souza Andrade for his excellent implementation of inference algorithms used in Section 2.

A Proofs

Proof of Theorem 3. We reduce (in polynomial time) a belief updating problem on a Boolean credal network N to this problem. The problem is:

given a Boolean credal networkN specified by the graphG, the set of variables X (associated to nodes ofG) and local credal sets K(Xi|pa(X_i)), for each Xi

and each combination of its parents, a query ˆXq (withXq ∈X) and a rational number r, is there a probability distribution d for X satisfying N such that P( ˆXq)≥r? Note that the credal sets are simple intervalsαi ≤P(Xi|pa(Xi))≤ βi, because the network is Boolean. We proceed with the reduction as follows:

• Rename eachXi ∈X as a formula φi.

• Specify each local credal set ofN as probabilistic logic assessmentsP(ϕi)αi ≤ P(φi∧ϕi)≤βiP(ϕi), where ϕi is^V_pa(φ_i₎, for each combination of pa(φi).

• Insert independence statements (φi⊥⊥φj|φk) involving any two nodesφiand φj that are separated byφk in the network. The number of such statements is polynomial, as the network at most encodes independences between all φi and φj givenφk, that is, less thanO(|X|³) statements (this would be the

(17)

case only if the network has no arcs and all nodes are independent to each other; usually the number of statements is smaller).

• Define the query θ as a formula associated to ˆXq and ϑ as a tautology.

Note that deciding whether maxq(θ ∧ ϑ) ≥ r is equivalent to evaluating whether there is a distributiondonN such thatPd( ˆXq)≥ r, because the same independence statements and probability constraints apply to both models (variables were just renamed). As the Boolean credal network belief updating problem described here is NP^PP-Complete [18], this theorem follows.2 Proof of Lemma 8. Suppose G has treewidth k from elimination order X1, . . . , Xn. Take the moral graph of G⁰ and a variable elimination order for it that begins with the variables C1j, C2j, . . . , Cmj (related to all clauses appearing on A) followed by X1, . . . , Xn. After eliminating all Cij, the moral graph ofG⁰ becomes exactly the moral graph ofG, because the parents of each Cij were already connected in the moral graph of G (we have supposed that each clause involves only a variable and its parents). Thus the width of this elimination order in G⁰ is equal to the width of X1, . . . , Xn inG, that is, k.2 Proof of Lemma 10. First, construct the auxiliary Boolean credal network N⁰ from N. Deciding whether φj is satisfied by d is exactly the question whether dleads toPd(φj) = 1, withφj =^V_iCij andCij a clause. Because formulas are in CNF, eachCijis a disjunction of literalsCij =∨kXˆijk, where ˆXijk

is a literal in N⁰. Thus, P(Cij) = 1 ⇐⇒ P(¬Cij) = 0 ⇐⇒ P(^V_k¬Xˆijk) = 0.

Note that this last assertion is just a joint query inN⁰. For a fixed probability distribution d, N⁰ is a Bayesian network (a single conditional distribution is selected from each credal set). So we just need to evaluate whether d leads to Pd(^Vk¬Xˆijk) = 0 for all clauses of φj. Thus, for a BTW PPL network, each such evaluation takes polynomial time inS(as aBTW Bayesian network belief updating is in P), and because of polynomial number of clauses, the total amount of time is polynomial in S too. For a multi-connected network, each evaluation is in PP (the complexity class for general belief updating in Bayesian networks). BecausePPis closed under truth-table reductions, we can make all the PP evaluations “in parallel” and answer yes if and only if each one answered yes. This completes the proof. 2

Proof of Lemma 11. Here we proceed similarly to Lemma 10. First, construct the auxiliary Boolean credal network N⁰ fromN. Deciding whether Aj

is satisfied by d is the same as deciding whether αiPd(ϕj) ≤ Pd(φj ∧ϕj) ≤ βjPd(ϕj), where φj = ^V_iCij, with Cij clause of φj and ϕj =^V_iDij, with Dij

clause of ϕj. By construction of auxiliary networks, clauses Cij and Dij are already represented by nodes inN⁰ encoding their truth tables. Thus, we just need to evaluatep1 =Pd(^V_iCij∧^V_iDij) andp2 =Pd(^V_iDij), and test whether p₁is in the interval [αjp₂, βjp₂]. Note that evaluation ofp₁andp₂ are just joint queries inN⁰. For a fixed probability distribution d,N⁰ is a Bayesian network.

(18)

Thus, for a BTW PPL network, both evaluations take polynomial time in S and total time is polynomial. For amulti-connectednetwork, each evaluation is inPP, and we need two adaptive evaluations. Because PP equals PP^[log^n] (that is,O(logn) adaptive queries to PP is still in PP), the lemma follows. 2

Proof of Theorem 12. Consider first BTWnetworks. Pertinence is achieved by Lemmas 10 and 11: given a probability distributiond, verifying satisfaction of each constraint/assessment in L and A takes polynomial time in the size of input. We show hardness with a polynomial time reduction from CNF-SAT: given a CNF Boolean formula ξ with clauses C₁, . . . , Cm in the variables X, is there an instantiation of X that satisfies ξ?Construct a PPL network N = (G,L,A) with no arcs in G and nodes X, empty A and L containing only the formulaξ. It is clear that there is a probability distributiondthat satisfies the PPL-inference in N if and only if there is an instantiation that satisfies the CNF-SATproblem, because we verify in thePPL-inference whetherP(¬Cj) =

Q

iP(¬Xˆij) = 0 for all 1≤j ≤ m, where ˆXij are the literals appearing in Cj. Note that the topology of G used here is even simpler than a polytree (there are no arcs and every node is independent from each other). So, even in this case the PPL-inference is NP-Complete.

Now consider multi-connected networks. Again, pertinence is achieved by Lem- mas 10 and 11. Given a probability distributiond, deciding whether each constraint inLand each assessment inA is satisfied is inPP. BecausePP is closed under truth-table reductions, we can make all the PP evaluations “in parallel” and answer yes if and only if each one answered yes. PPL-inference is NPPP

-Hard because a PPL network is an obvious generalization of a Boolean credal network. The belief updating problem in a Boolean credal network is to decide whether P( ˆXq)≥r, where ˆXq is the query and r is a rational number.

To polynomially reduce this problem to a PPL-inference, we just have to encode the local credal sets of the credal network as assessments in A (this is trivial, as assessments in A are more general than local credal sets) and insert an additional assessment specifying that rP(ϕ) ≤ P(φ∧ ϕ) ≤ P(ϕ), withφ = ˆXq and ϕas a tautology. Now the theorem is immediate. A solution to the PPL-inference solves belief updating in a Boolean credal network. 2

References

[1] K. A. Andersen and J. N. Hooker. Bayesian logic. Decision Support Systems, 11:191–210, 1994.

[2] F. Bacchus. Representing and Reasoning with Probabilistic Knowledge: A Logical Approach. MIT Press, Cambridge, 1990.

[3] G. Boole. The Laws of Thought. Dover edition, 1958.

(19)

[4] M. Chavira and A. Darwiche. Compiling Bayesian networks with local structure.

InProc. of the 19th Int. Joint Conf. on Artificial Intelligence, 2005.

[5] G. Coletti and R. Scozzafava. Probabilistic logic in a coherent setting. Trends in logic, 15. Kluwer, Dordrecht, 2002.

[6] I. Couso, S. Moral, and P. Walley. A survey of concepts of independence for imprecise probabilities. Risk, Decision and Policy, 5:165–181, 2000.

[7] R. G. Cowell, A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter.Probabilistic Networks and Expert Systems. Springer-Verlag, New York, 1999.

[8] F. G. Cozman. Calculation of posterior bounds given convex sets of prior probability measures and likelihood functions. Journal of Computational and Graphical Statistics, 8(4):824–838, 1999.

[9] F. G. Cozman. Constructing sets of probability measures through Kuznetsov’s independence condition. In Proc. of the Second Int. Symp. on Imprecise Probabilities and Their Applications, pages 104–111, Ithaca, New York, 2001.

[10] F. G. Cozman. Graphical models for imprecise probabilities. Int. Journal of Approximate Reasoning, 39(2-3):167–184, 2005.

[11] F. G. Cozman, C. P. de Campos, and J. C. Ferreira da Rocha. Probabilistic logic with strong independence. In Brazilian Symp. on Artificial Intelligence, 2006.

[12] F. G. Cozman and P. Walley. Graphoid properties of epistemic irrelevance and independence. Annals of Mathematics and Artificial Intelligence, 45:173–195, 2005.

[13] L. Crisma. The notion of stochastic independence in the theory of coherent probability. Quad. Dip. Mat. Appl. Sci. Ec. St. Att. B. de Finetti, 8/91, 1999.

[14] A. Darwiche. A logical notion of conditional independence: Properties and application. Artificial Intelligence, 97(1-2):45–82, 1997.

[15] C. P. de Campos and F. G. Cozman. Inference in credal networks using multilinear programming. In E. Onaindia and S. Staab, editors, Proc. of the Second Starting AI Researchers’ Symp. (STAIRS), pages 50–61, Amsterdam, The Netherlands, 2004. IOS Press.

[16] C. P. de Campos and F. G. Cozman. Belief updating and learning in semi- qualitative probabilistic networks. In F. Bacchus and T. Jaakkola, editors,Conf.

on Uncertainty in Artificial Intelligence (UAI), Edinburgh, Scotland, 2005.

[17] C. P. de Campos and F. G. Cozman. Computing lower and upper expectations under epistemic independence. InFourth Int. Symp. on Imprecise Probabilities and Their Applications (ISIPTA), pages 78–87, Dulles, Virginia, 2005. Brightdoc.

[18] C. P. de Campos and F. G. Cozman. The inferential complexity of Bayesian and credal networks. In Int. Joint Conf. on Artificial Intelligence (IJCAI), pages 1313–1318, Edinburgh, United Kingdom, 2005.

(20)

[19] B. de Finetti. Theory of probability, vol. 1-2. Wiley, New York, 1974.

[20] R. de S. Braz, E. Amir, and D. Roth. MPE and partial inversion in lifted probabilistic variable elimination. In National Conf. on Artificial Intelligence (AAAI), 2006.

[21] R. Dechter. Bucket elimination: A unifying framework for reasoning. Artificial Intelligence, 113(1-2):41–85, 1999.

[22] R. Dechter and D. Larkin. Hybrid processing of beliefs and constraints. In Conf. on Uncertainty in Artificial Intelligence, 2001.

[23] R. Dechter and R. Mateescu. Mixtures of deterministic-probabilistic networks and their AND/OR search space. In M. Chickering and J. Halpern, editors, Conf. on Uncertainty in Artificial Intelligence, pages 120–129. AUAI Press, 2004.

[24] G. Georgakopoulos, D. Kavvadias, and C. H. Papadimitriou. Probabilistic satisfiability. Journal of Complexity, 4:1–11, 1988.

[25] L. Getoor, N. Friedman, D. Koller, and B. Taskar. Learning probabilistic models of relational structure. InInt. Conf. on Machine Learning, pages 170–177, 2001.

[26] A. Gilio and R. Scozzafava. Conditional events in probability assessment and revision. IEEE Trans. on Systems, Man and Cybernetics, 24(12):1741–1746, 1994.

[27] R. Giugno and T. Lukasiewicz. P-SHOQ(D): A probabilistic extension of SHOQ(D) for probabilistic ontologies in the semantic web. In S. Flesca, S. Greco, N. Leone, and G. Ianni, editors, Proc. of the 8th European Conf. on Logics in Artificial Intelligence (JELIA), volume 2424, pages 86–97, Cosenza, Italy, September 2002. Lecture Notes in Artificial Intelligence, Springer.

[28] T. Hailperin. Best possible inequalities for the probability of a logical function of events. American Mathematical Monthly, 72:343–359, 1965.

[29] J. Y. Halpern. Reasoning about uncertainty. MIT Press, Cambridge, Massachusetts, 2003.

[30] P. Hansen and B. Jaumard. Probabilistic satisfiability. Report G-96-31, Les Cahiers du GERAD, ´Ecole Polytechique de Montr´eal, 1996.

[31] P. Hansen and S. Perron. Merging the local and global approaches to probabilistic satisfiability. Report, Cahier du GERAD G-2004-48, 2004.

[32] S. Holzer. On coherence and conditional prevision. Boll. Un. Mat. Ital. Serie VI, 1:441–460, 1985.

[33] C. Howson and P. Urbach. Scientific reasoning: the Bayesian approach. Open Court Publishing Company, Chicago, Illinois, 1993.

[34] J. S. Ide and F. G. Cozman. IPE and L2U: Approximate algorithms for credal networks. InSecond Starting AI Researcher Symposium (STAIRS), pages 118–

127. IOS Press, 2004.

(21)

[35] M. Jaeger. Probabilistic reasoning in terminological logics. In Principles of Knowledge Representation (KR), pages 461–472, 1994.

[36] M. Jaeger. Relational Bayesian networks. In D. Geiger and P. P. Shenoy, editors, Conf. on Uncertainty in Artificial Intelligence, pages 266–273, San Francisco, California, 1997. Morgan Kaufmann.

[37] M. Jaeger. Complex probabilistic modeling with recursive relational Bayesian networks. Annals of Mathematics and Artificial Intelligence, 32:179–220, 2001.

[38] D. Jovanović, N. Mladenović, and Z. Ognjanović. Variable neighborhood search for the probabilistic satisfiability problem. In Proc. of the 6th Metaheuristics Int. Conf. (MIC 2005), Vienna, Austria, 2005.

[39] L. V. S. Lakshmanan and F. Sadri. Probabilistic deductive databases. InSymp.

on Logic Programming, pages 254–268, 1994.

[40] D. Larkin and R. Dechter. Bayesian inference in the presence of determinism.

InAI and Statistics (AI-STAT), 2003.

[41] T. Lukasiewicz. Probabilistic logic programming. In European Conf. on Artificial Intelligence, pages 388–392, 1998.

[42] T. Lukasiewicz. Probabilistic logic programming with conditional constraints.

ACM Trans. on Computational Logic, 2(3):289–339, 2001.

[43] D. Mitchell, B. Selman, and H. Levesque. Hard and easy distributions of SAT problems. In Proc. of the Tenth National Conf. on Artificial Intelligence (AAAI), pages 459–465, San Jose, California, July 1992.

[44] S. Moral. Epistemic irrelevance on sets of desirable gambles. Annals of Mathematics and Artificial Intelligence, 45(1-2):197–214, October 2005.

[45] R. E. Neapolitan. Learning Bayesian Networks. Prentice Hall, 2003.

[46] A. Nerode and R. A. Shore. Logic for Applications (2nd ed.). Springer-Verlag, New York, 1997.

[47] R. Ng and V. S. Subrahmanian. Probabilistic logic programming. Information and Computation, 101(2):150–201, 1992.

[48] L. Ngo and P. Haddawy. Answering queries from context-sensitive probabilistic knowledge bases. Theoretical Computer Science, 171(1–2):147–177, 1997.

[49] N. J. Nilsson. Probabilistic logic. Artificial Intelligence, 28:71–87, 1986.

[50] Z. Ognjanovi´c. Discrete linear-time probabilistic logics: Completeness, decidability and complexity. Journal of Logic Computation, 16(2):257–285, 2006.

[51] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo, California, 1988.

[52] D. Poole. Probabilistic Horn abduction and Bayesian networks. Artificial Intelligence, 64:81–129, 1993.

(22)

[53] D. Poole. First-order probabilistic inference. In Int. Joint Conf. on Artificial Intelligence (IJCAI), pages 985–991, 2003.

[54] K. R. Popper. The logic of scientific discovery. Hutchinson, London, 1975.

[55] E. Regazzini. Finitely additive conditional probability. Rend. Sem. Mat. Fis., 55:69–89, 1985.

[56] A. Renyi. On a new axiomatic theory of probability. Acta Math. Acad. Sci.

Hungarian, 6:285–335, 1955.

[57] M. Richardson and P. Domingos. Markov logic networks. Machine Learning, 62(1-2):107–136, 2006.

[58] H.D. Sherali and W.P. Adams. A Reformulation-Linearization Technique for Solving Discrete and Continuous Nonconvex Problems. Kluwer Academic Publishers, 1999.

[59] B. Vantaggi. Graphical models for conditional independence structures. In Second Int. Symp. on Imprecise Probabilities and Their Applications, pages 332–341. Shaker, 2001.

[60] B. Vantaggi. Graphical representation of asymmetric graphoid structures. In Third Int. Symp. on Imprecise Probabilities and Their Applications, pages 560–

574. Carleton Scientific, 2003.

[61] P. Vicig. Epistemic independence for imprecise probabilities. Int. Journal of Approximate Reasoning, 24:235–250, 2000.

[62] P. Walley. Statistical Reasoning with Imprecise Probabilities. Chapman and Hall, London, 1991.

[63] P. Walley, R. Pelessoni, and P. Vicig. Direct algorithms for checking consistency and making inferences from conditional probability assessments. Journal of Statistical Planning and Inference, 126(1):119–151, 2004.

[64] N. L. Zhang and D. Poole. Exploiting causal independence in Bayesian network inference. Journal of Artificial Intelligence Research, pages 301–328, 1996.