Learning Probabilistic Sentential Decision Diagrams by Sampling

(1)

Learning Probabilistic Sentential Decision Diagrams by Sampling

KDMiLe 2020

Renato Geh¹ Denis Mauá¹ Alessandro Antonucci²

1Institute of Mathematics and Statistics, University of São Paulo, Brazil {renatolg,ddm}@ime.usp.br

2Istituto Dalle Molle di Studi sull’Intelligenza Artificiale, Switzerland [email protected]

(2)

Background

(3)

Probabilistic Sentential Decision Diagrams (PSDDs)

vtree

B A D C

D : θ

D

C ¬B

B A ¬B ⊥ B ¬A D C ¬D ⊥

0.35 0.4 0.25

1.0 0.0 0.0 1.0 0.3 0.7

Kisa et al. 2014

(4)

vtree

B A D C

D : θ

D

C ¬B

B A ¬B ⊥ B ¬A D C ¬D ⊥

OR gate

AND gate

0.35 0.4 0.25

1.0 0.0 0.0 1.0 0.3 0.7

Leaves are eitherliterals,constants orBernoulli distributions.

(5)

vtree

B A D C

Elements...

D : θ

D

C ¬B

B A ¬B ⊥ B ¬A D C ¬D ⊥

0.35 0.4 0.25

1.0 0.0 0.0 1.0 0.3 0.7

(A∧B)

| {z }

prime

∧(D∨ ¬D)

| {z }

sub

(6)

vtree

B A D C

Partition...

D : θ

D

C ¬B

B A ¬B ⊥ B ¬A D C ¬D ⊥

0.35 0.4 0.25

1.0 0.0 0.0 1.0 0.3 0.7

(A∧B)

| {z }

e1

∨((¬A∧B)∧C)

| {z }

e2

∨((C∧D)∧ ¬B)

| {z }

e3

(7)

Related Works

(8)

LearnPSDD

Given. A vtree and a circuit.

Idea. Learn a PSDD as an expansion of initial circuit.

How?

1. Recursively apply small changes;

2. Evaluate score;

3. Greedily accept changes.

Score(S,S⁰|D) = logp(S⁰|D)−logp(S|D)

|S⁰| − |S|

What are these “small” changes?

Liang, Bekker, and G. V. d. Broeck 2017

(9)

Splitan element...

α

β γ

A ¬A

Split onA

α

β∧A

β∧A γ

A ¬A

(10)

Clonea partition...

α

Clone

α α

(11)

Strudel

Given. A Chow-Liu Tree (CLT).

Idea. Learn a PSDD and its vtree.

How?

1. Learn a CLT.

2. Extract a vtree from CLT.

3. Compile CLT into circuit.

4. “Grow” circuit with Split.

Dang, Vergari, and G. v. d. Broeck 2020

(12)

Dang, Vergari, and G. v. d. Broeck 2020

(13)

Learning by Sampling

(14)

Monte-Carlo Structure Learning

Previous attemptsgrewor compiledexisting models.

Instead, we want tobuild a circuit solely from knowledge.

Our approach. Start off with a logic formula:

φ(A,B,C,D,E)=(B∧C∧((A∧D∧¬E)∨(¬A∧E)))∨(¬A∧((¬C∧D∧¬E)∨¬B))

Recursively decomposeφ top-down through subsequent partitions.

(15)

vtree

A B C D E

How to decompose formula into elements...

(B∧C∧((A∧D∧¬E)∨(¬A∧E)))∨(¬A∧((¬C∧D∧¬E)∨¬B))

A∧B∧C D∧ ¬E

A∧B∧ ¬C

⊥

A∧ ¬B

⊥

¬A∧B∧C E

¬A∧B∧ ¬C D∧ ¬E

A∧ ¬B

>

...while ensuring primes are exhaustive and mutually exclusive?

(16)

Given a prime ordering, such asO= (B, C, A), return a set of primes.

B

¬B

BC

B¬C

¬BC

¬B¬C

BCA BC¬A

¬BCA

¬BC¬A B¬CA B¬C¬A

¬B¬CA

¬B¬C¬A

(17)

B

¬B

BC

B¬C

¬BC

¬B¬C

BCA BC¬A

¬BCA

¬B¬CA

¬B¬C¬A

(18)

B

¬B

BC

B¬C

¬BC

¬B¬C

BCA BC¬A

¬BCA

¬B¬CA

¬B¬C¬A

(19)

B

¬B

BC

B¬C

¬BC

¬B¬C

BCA BC¬A

¬BCA

¬B¬CA

¬B¬C¬A

(20)

B

¬B

BC

B¬C

¬BC

¬B¬C

BCA BC¬A

¬BCA

¬B¬CA

¬B¬C¬A

This gives an exponential number of elements!

(21)

Instead, let’s “forget” a variable when it does not “matter”, i.e.

when

φ|_x=φ|_¬x.

B

¬B

B

¬BC

¬B¬C

BA

B¬A

¬BCA

¬BC¬A

¬B¬CA

¬B¬C¬A

(22)

when

φ|_x=φ|_¬x.

B

¬B

B

¬BC

¬B¬C

BA

B¬A

¬BCA

¬BC¬A

¬B¬CA

¬B¬C¬A

(23)

when

φ|_x=φ|_¬x.

B

¬B

B

¬BC

¬B¬C

BA

B¬A

¬BCA

¬BC¬A

¬B¬CA

¬B¬C¬A

(24)

when

φ|_x=φ|_¬x.

B

¬B

B

¬BC

¬B¬C

BA

B¬A

¬BCA

¬BC¬A

¬B¬CA

¬B¬C¬A

(25)

when

φ|_x=φ|_¬x.

B

¬B

B

¬BC

¬B¬C

BA

B¬A

¬BCA

¬BC¬A

¬B¬CA

¬B¬C¬A

We avoid computing an exponential number of primes!

(26)

In other words...

Given.

I A random prime ordering O={X₁, . . . , X_n};

I A formula φ.

Generate a set of primes{p_i}ⁿ_i=1, and then subs s_i=φ|_p_i. But what ifφ≡ >or ∀X_i∈O,Xi 6∈φ?

(27)

Then we have even more freedom! Stochastically marginalize with some probabilityp.

B

¬B

B

¬BC

¬B¬C

BA

B¬A

¬BCA

¬BC¬A

¬B¬CA

¬B¬C¬A

We can marginalize any variable we want, since any operation onφ withX_i is idempotent.

(28)

Then we have even more freedom! Stochastically marginalize with some probabilityp.

C

¬C

CA

C¬A

¬CA

¬C¬A

We can marginalize any variable we want, since any operation onφ withX_i is idempotent.

(29)

(B∧C∧((A∧D∧¬E)∨(¬A∧E)))∨(¬A∧((¬C∧D∧¬E)∨¬B))

A∧B∧C D∧ ¬E

A∧B∧ ¬C

⊥

A∧ ¬B

⊥

¬A∧B∧C E

¬A∧B∧ ¬C D∧ ¬E

A∧ ¬B

>

(30)

(B∧C∧((A∧D∧¬E)∨(¬A∧E)))∨(¬A∧((¬C∧D∧¬E)∨¬B))

A∧B∧C D∧ ¬E

¬A∧B∧C E

¬A∧B∧ ¬C D∧ ¬E

A∧ ¬B

>

(31)

Compression

e₁ e₂ e₃

α β γ β ... ...

Compress

e₁ e₂

α∨γ β ... ... Compress elements(p₁, s) and(p₂, s)by replacing them with (p1∨p2, s).

Compressiongenerates different distributions consistent with formula.

(32)

(B∧C∧((A∧D∧¬E)∨(¬A∧E)))∨(¬A∧((¬C∧D∧¬E)∨¬B))

A∧B∧C D∧ ¬E

¬A∧B∧ ¬C D∧ ¬E

¬A∧B∧C E

A∧ ¬B

>

(33)

(B∧C∧((A∧D∧¬E)∨(¬A∧E)))∨(¬A∧((¬C∧D∧¬E)∨¬B))

(¬A∧B∧ ¬C)∨(¬A∧B∧ ¬C) D∧ ¬E

¬A∧B∧C E

A∧ ¬B

>

(34)

If there arencompressable elements...

1 2 · · · n

α1 β α2 β αn β ... ...

We can generatePn k=1

n k

different circuits.

(35)

1 2 · · · n

α1 β α2 β αn β ... ...

Uniformly sample a circuit from Pascal Triangle’sk-th row.

(36)

1 2 · · · n

α₁ β α₂ β α_n β ... ...

1 2

α1∨α2∨. . .∨αn

β ... ...

But how to efficiently compute potentially complex disjunctions?

(37)

Binary Decision Diagrams (BDDs)

A

B B

C

⊥ >

Operation Description Complexity Reduce canonical form ofφ O(n·logn) Apply φ1⊕φ2 O(n1·n2) Restrict φ|x O(n·logn) Forget φ|x∨φ|¬x O(n²)

φ(A, B, C) = (A∨ ¬B)∧(¬B∨C)

(38)

Results

(39)

Likelihood

Data #vars Worst Best Average LSPN-Opt LSPN-CV CLT CNet 1 10 -449.89 -413.51 -415.60 -422.37 -444.62 -456.49 -585.23 2 15 -745.20 -693.51 -695.82 -695.09 -739.49 -803.41 -1070.29 3 20 -1024.01 -969.51 -971.80 -959.03 -1003.93 -1075.31 -1855.05 4 25 -1268.82 -1208.22 -1210.27 -1185.28 -1254.47 -1290.94 -2033.58 5 30 -1548.92 -1440.27 -1442.58 -1441.90 -1543.35 -1535.54 -2048.16 6 100 -5169.14 -4995.53 -4997.83 -4958.06 -5232.04 -5712.73 -10326.73 7 100 -5329.17 -5153.51 -5155.82 -4900.65 -5206.54 -5710.64 -9948.88 8 14 -472.15 -423.32 -425.62 -490.21 -506.62 -486.06 -601.31

Data Logic formula

1 φ1= (X1∧X3)∨(X4∧ ¬X2)∨(X5∧ ¬X10)

2 φ2= (X1∧X3)∨(X4∧ ¬X2)∨(X5∧ ¬X10)∨(X12∧ ¬X13∧X15∧ ¬X14) 3 φ3= (X1∧X3)∨(X4∧ ¬X2)∨(X5∧ ¬X10)∨(X12∧ ¬X13∧X15∧ ¬X14) 4 φ4= (X1∧X3)∨(X4∧ ¬X2)∨(X5∧ ¬X10)∨(X12∧ ¬X13∧X15∧ ¬X14) 5 φ5= (X2∨X30)∧(¬X15∨ ¬X10)∧(¬X25∨X5∨X15∨ ¬X1)∧(X1∨X15∨ ¬X30)

6 φ6= (X10∨X30)∧(¬X1∨X5)∧(¬X10∨X14∨X23)∧(X2∨ ¬X27∨X35)∧(X98∨ ¬X78∨ ¬X27∨X8) 7 φ7=>

8 φ8=d0∨d1∨d2∨d3∨d4∨d5∨d6∨d7∨d8∨d9

(40)

Thank You!

Questions?

(41)

References

(42)

References I

Dang, Meihua, Antonio Vergari, and Guy van den Broeck (2020).

“Strudel: Learning Structured-Decomposable Probabilistic Circuits”.In: Proceedings of the Tenth International Conference on Probabilistic Graphical Models.

Kisa, Doga et al. (2014).“Probabilistic Sentential Decision Diagrams”.In: Knowledge Representation and Reasoning Conference. URL: https://www.aaai.org/ocs/index.php/

KR/KR14/paper/view/8005.

Liang, Yitao, Jessa Bekker, and Guy Van den Broeck (2017).

“Learning the Structure of Probabilistic Sentential Decision Diagrams”.In: Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence, UAI 2017, Sydney, Australia, August 11-15, 2017. Ed. by Gal Elidan, Kristian Kersting, and Alexander T. Ihler. AUAI Press. URL:

http://auai.org/uai2017/proceedings/papers/291.pdf.