Tópicos Especiais em
Modelagem e Análise -
Aprendizado por Máquina
CPS863
Daniel, Edmundo, Rosa Terceiro trimestre de 2012
UFRJ - COPPE
Programa de Engenharia de Sistemas e Computação
A Bayesian network B is a Directed Acyclic Graph It represents a joint probability distribution over a set of random variables
The network is defined by a pair B=(G,Θ), where G
is the DAG whose nodes X1,X2, · · · ,Xn represent
the direct dependences between these variables If the variable represented by a node is observed, the node is an evidence node, otherwise it is a
hidden node
The graph G encodes independence assumptions,
by which each variable Xi is independent of its
nondescendants given its parents in G.
Θ is the set of parameters of the network.
This set contains the parameters:
Bayesian Networks Definition
θ
xΘ is the set of parameters of the network. This set contains the parameters:
B defines a unique joint probability distribution over V:
Bayesian Networks Definition
θ
xi/πi
=
P(x
i/
π
i) (
for each realization x
iof X
i)
P( X
1,X
2,... , X
n)=
∏
i=1 nP( X
i/
π
i)=
∏
i=1 nθ
X i/πiIndependence: Conditional Independence:
Conditional Independence
X
A⊥
X
B⇔
p( x
A, x
B)=
p(x
A)
p( x
B)
X
A⊥
X
B/
X
C⇔
p( x
A, x
B/
x
C)=
p( x
A/
x
C)
p(x
B/
x
C)
⇔
p(x
A/
x
B, x
C)=
p(x
A/
x
C)
BN is used to compute marginal distributions of one or more query nodes
E is the set of observed variables (the evidence nodes) X is the set of unobserved variables whose values we are interested in estimating (query nodes)
W are the random variables that are neither query nor evidence nodes
Inference via BN
P( X / E=e)=
P( X , e)
P(e)
=
∑
wP( X , e , w)
∑
xP(x , e)
BN Inference Example
P(C=T / A=T )=P(C=T , A=T ) P( A=T )
What is the probability of uncomfortable chair given the observation that the person suffers from backache ? P(C=T , A=T )= ∑ S ,W , B∈[T , F ] P(C=T ) P(S) P(W /C=T )P(B /S , C=T ) P( A=T /B) P( A=T )= ∑ S ,W , B ,C ∈[T , F ] P(C) P(S) P(W /C) P(B/ S ,C ) P( A=T /B) The n umbe r of te rms in the s um wi ll grow expo nentia lly with t he nu mber of hid den n odes
Exact inference is an NP-hard problem
Some algorithms to restricted classes networks (ex: message passing)
Approximate inference methods (ex: Markov Chain Monte Carlo)
BN is unknown and we want to learn it from the data
Given training data and prior information (e.g., expert knowledge, casual relationships), estimate the graph topology (network structure) and the
parameters of the JPD in the BN.
St (system state at time t) = a,b,c
BN x HMM
Independence properties in probabilistic graphical models can be exploited to reduce the computation cost of the query process
If we know that a set of nodes X is independent of a set of nodes Y given E, then in the presence of E, variables in X cannot influence the beliefs about Y
The independence assumptions satisfied by a Bayesian network can be identified using a graphical test called
d-separation
Four general cases to analyze whether knowing an evidence (variable E) about a variable X can change the beliefs about a variable Y
Represents an indirect causal effect, where an ancestor X of Y could pass influence via E
The interpretation is “the past is independent of the future”
D-Separation: First case
X can only influence Y in the presence of E if E is not observed
Evidence E “blocks” influence of X over Y
D-Separation: First case example
If we know the gene Susan received from her mother, the gene passed from Susan's
grandmother no longer influence her blood type
Blood type of Susan
Gene Susan received from her mother (Allen) Gene Allen received from her mother (Lily)
Lily Allen
Symmetrical case of the first case: we want to know whether evidence above a descendant
may affect an indirect ancestor
D-Separation: Second case
It is an indirect evidence effect Evidence E “blocks” influence of Y over X
D-Separation: Second case example
Once the gene Susan received from her mother Allen is known, the blood type of Susan is not
able to affect the beliefs on the gene Allen received from her mother Lily
Lily Allen
Susan Blood type of Susan
Gene Susan received from her mother (Allen) Gene Allen received from her mother (Lily)
The node E is a common cause to nodes X and Y
D-Separation: Third case
Evidence E “blocks” influence of X over Y
D-Separation: Third case example
Shoe size and amount of gray hair of a person are highly dependent
Given age, they are independent, since age provides all the information that “shoe size” can provide to infer “amount of gray hair”
Shoe size Amount of gray hair
Common effect trail
Three previous cases: X can influence Y via E if and only if E is not observed
This case: X can influence Y via E if and only if E is observed
D-Separation: Fourth case
If the common effect variable is not observed, knowing about a parent variable cannot affect the expectation about other parents
D-Separation: Fourth case example
If we don't know the evidence “late for lunch”, knowing about “missed the bus” cannot affect the expectation about “traveling on vacations”
If we know “late for lunch”, it increases the probability of “missed the bus”
Late for lunch
Missed the bus Traveling on vacations
Block definition
Definition 1 Block: Let X and Y be random
variables in the graph of a Bayesian network. We say an undirected path between X and Y is
blocked by a (set of ) variable E if E is in such a path and influence of X cannot reach Y and
change the beliefs about it because of evidence (or lack of it) of E.
Active Trail definition
Definition 2 Active trail: Given an undirected path X1 ... Xn in the graph component of a
Bayesian network, there is an active trail from Xi to X
n given a subset of the observed variables E, if
whenever we have a v-structure , then Xi or one of its descendants are in E;
in all other cases no other node along the trail is in E.
V-structure
Whenever we have a v-structure, then F or one of its descendants are in E
Late for lunch
Missed the bus Traveling on vacations
F' Missed the meeting F
Active Trail Example
A B C D E F G H I Not observed : B, D, E, G Observed : C and (F or I)Are A and H independent or there is an active trail between them ?
v-structure
Active Trail Example
A B C D E F G H IObserve at least one of : B, D, E, G Not observe: C or (F and I)
A and H are independent in the following case:
v-structure
D-separation definition
Denition 3 d-separation: X and Y are d-separated
by a set of evidence variables E if there is no active trail between any node and given E, i.e, every undirected path from X to Y is blocked.
The d-separation test guarantees that X and Y are independent given E if every path between X and Y is blocked by E and therefore influence cannot flow from X through E to affect the beliefs about Y.
if and only if E d-separates X from Y in the graph G.