Hierarchical Models
Roadmap: Models of increasing complexity
• Poisson plots: Individual differences via random effects.
• Hidden process models – Light limitation of trees
• Simple non-hierarchical model
• Adding individual differences in max growth rate – LIDET decomposition: sampling errors in y’s
– Light limitation of trees
• Errors in covariates
• Adding a treatment
R.A. Fischer’s ticks
A simple example: We want to know (for some reason) the average number of ticks on sheep. We round up 60 sheep and count ticks on each one. Does a Poisson
distribution fit the data?
) ( )
| ( )
|
(
601
y P y P P
i
i
Data
Parameter
A single mean governs the pattern
y
i
Exercise: Write out a simple Bayesian model for estimating the average number of ticks per sheep.
How would you estimate that mean?
) ( )
| (
)
| (
) ( )
| (
)
| (
1 1
gamma y
Poisson y
P
P y
P y
P
n
i i
n
i i
model{
#prior
lambda ~ dgamma(0.001,0.001)
# likelihood for(i in 1:60){
y[i] ~ dpois(lambda) }
Data
Parameter
Hyperparameter
Each sheep has its own mean (a.k.a. random effect)
y
i
i
Hierarchical models have quantities that appear on both sides of the |
Shape and rate must be numbers
yi
i
) ,
| ( )
,
| (
) ,
| ( )
| (
)
| , , (
) ( ) ( ) ,
| ( )
| (
)
| , , (
1 1
rate shape
gamma rate
shape gamma
gamma y
Poisson y
P
P P
P y
P y
P
i i
n
i i
i i
n
i i
Hierarchical Model
model{
# Priors
a~ dgamma(.001,.001) b~ dgamma(.001,.001) for(i in 1:60){
lambda[i] ~ dgamma(a,b) y[i] ~ dpois(lambda[i]) }
} #end of model
) ( ) (
) ,
| ( )
| ( )
| , , (
60
1
b P a P
b a P
y P y
b a
P i
i
i
i
Hidden Processes in Ecology
• The things we are able to observe are not usually what we seek to understand.
• The things we wish to estimate but can’t observe we call latent.
• We assume that the true, latent state gives rise to the observations with some uncertainty—observation error.
• We have a model that predicts the true, latent state imperfectly.
All the things that influence the true latent state that are not represented in our model we treat stochastically in process variance.
• All this means that we need a model for the data (with
uncertainty) and a model for the process (with uncertainty).
Blueprint for Hierarchical Bayes
• The probabilistic basis for factoring complex models into sensible components via rules of conditioning.
• Directed Acyclic Graphs (DAGs) again.
• Conditioning from DAGs.
• JAGS and MCMC from conditioning.
Conditioning
Remember from basic laws of probability that:
This generalizes to:
where the components z
imay be scalars or subvectors of z and the sequence of their conditioning is arbitrary.
) ( )
| (
) ,
| ( )....
...
| (
) ,...
, ,
) ,...
, ,
) ( )
| (
) ( )
| ( )
, (
1 1
2 2
1 3 1
1 3
2 1
3 2 1
1 1
2 2
2 1 2
1
z p z
z p z
z z p z
z z
p z
z z z p
z z
z z
z p z
z p z
p z
z p z
z p
n n
n n
(
(
z
So what does this mean?
All these are correct. Any quantity that appears on the rhs of | and does not appear in the lhs of | must end up as a P().
) ( ) ( ) ,
| (
) ,
,
| ( )
, ,
| (
) ( ) ( ) ,
| ( ) ,
,
| ( )
, , , (
) ( ) ( ) ,
| ( ) ,
,
| ( )
, , , (
) ( ) ( ) ( ) ,
,
| ( )
, , , (
C P B P C B D
P D C B A P D
C B A P
D P B P D B C
P D C B A P D
C B A P
D P C P D C B
P D C B A P D
C B A P
B P D P C P D C B A P D
C B A P
A general, hierarchical model for ecological processes
Define:
θ : A vector of parameters that can be decomposed into two additional vectors:
θ ={θ
p,θ
d} where θ
dare parameters in a model of the data and θ
pare parameters in the model of the process. Both vectors include parameters representing uncertainty.
y: A vector or matrix of data, including responses and covariates.
μ: The unobserved, true state of the system-the size of the
population of sites occupied, the mass of C in a gram of soil,
etc.
A general, hierarchical model for ecological processes
y
i
i
p
d) (
) (
)
| )(
,
| (
)
| ,
,
(
d py P y
d pP
dP
pP
We will make use of the following three laws from probability theory:
B)
| P(A B)
P(A, therefore
), (
)
| ( )
, ( )
(
) C ( ) C ( ) C
| ( ) C ,
| ( )
C , C , , ( )
(
} C , {C C
Let
) ( )
| ( ) ,
| ( )
, , ( )
(
2 1
2 1
2 1
2 1
B P B A P B
A P III
P P
B P B
A P B
A P II
C P C B
P C B A
P C
B
A
P
I
A general, hierarchical model for ecological processes
) ( ) ( )
| ( ) ,
| ( )
| , , (
: is on distributi posterior
the above, 5
and 4
using Therefore,
)
| , , ( )
, , , ( )
5 (
) ( )
| , , ( )
, , , (
: III Using
) ( ) ( )
| ( ) ,
| ( )
, , , ( )
4 (
: II Using
} , { Let
) ( )
| ( ) ,
| ( )
, , (
: I Using
d p
p d
p d
p d p
d
p d p
d
d p
p d
p d
p d
P P
P y
P y
P
y P
y P
y P y P
y P
P P
P y
P y
P
P P
y P y
P
The data
Predictions of the true state by a process model
Parameters in the process model, including process uncertainty
Parameters in the data model, including observation uncertainty
) (
) (
)
| (
) ,
| ( )
| ,
,
(
d py P y
dP
pP
pP
dP
This general set-up provides a basis for Gibbs sampling or JAGS code.
Gibbs sampling: For each step in the chain, we cycle over each parameter and estimate its posterior distribution based on the probability densities where it appears, treating other parameters at their current value at each step in the chain. We sample from the full conditional distribution:
remembering there may be more several more steps because θd and θp are vectors or parameters.
) (
) (
)
| (
) ,
| ( )
| ,
,
(
d py P y
dP
pP
pP
dP
) (
) ,
| ( )
,
| (
) (
)
| ( )
| (
)
| (
) ,
| ( )
, ,
| (
d d
d
p p
p
p d
p d
P y
P y
P
P P
P
P y
P y
P
Roadmap: Models of increasing complexity
• Poisson plots: Individual differences via random effects.
• Hidden process models – Light limitation of trees
• Simple non-hierarchical model
• Adding individual differences in max growth rate – LIDET decomposition: sampling errors in y’s
– Light limitation of trees
• Errors in covariates
• Adding a treatment
Steps
• Diagram the unknowns and knowns for a single observation (A DAG)
• Using P() notation, write out an expression for the posterior distribution using arrows in the DAG as a guide.
• Chose appropriate distributions for the P()’s. Take products over all observations.
• Implement via MCMC or JAGS.
Developing a simple Bayesian model for light limitation of trees
μi= prediction of growth for tree I xi= light measured for tree i
ϒ= max. growth rate at high light c=minimum light requirement α=slope of curve at low light
0 1 2 3 4 5 6 7
0.00.20.40.60.81.0
light Availability
Growth Rate
) , (
~
) , ( )
| (
) (
) (
) ) (
, , , (
i i
i i
i i i
i
Normal y
Normal y
p
c L
c x L
c g
A simple Bayesian model
Write out the numerator for Bayes Law
x i y i
c , ,
A simple Bayesian model
Now write out the full model by choosing appropriate
distributions for the P()’s.
Assume that the y’s can take negative values. There are n observations.
x i y i
c , ,
) ( ) ( ) ( ) (
) ),
, , , (
| (
) ,
| ,
, , (
P c P P
P
x x
c g
y P
x y c
P
i i
i
i
x i y i
c , ,
) 001 ,.
001 .
| (
) 001 ,.
001 .
| (
) 001 ,.
001 .
| (
) 001 ,.
001 .
| (
) ), ,
, , (
| ( ( )
| , , , (
) ( ) ( ) ( ) ( ) ), ,
, , (
| ( )
,
| , , , (
) (
) (
) ) (
, , , (
2
1
gamma
x c
gamma
x gamma
x gamma
x x
c g
y normal c
P
P c P P
P x
c g
y P x
y c
P
c L
c x L
c g
i i
n i
i i
i i
i i
i i
x
y,
JAGS code
) 001 ,.
001 (.
~
) 001 ,.
001 (.
~
) 001 ,.
001 (
~
) 001 ,.
001 (
~
/ 1 }
) ], [ dnorm(
~ y[i]
x[i]
/
c - - x[i]
] [
n){
: 1 in (
dgamma dgamma c
dgamma dgamma
i i c
i for
Now assume that there is individual variation in the α (max growth rate), such that each α, is drawn from a distribution of α’s. Sketch the DAG; write out the
posterior and the joint conditional distribution using P() notation.
Write out the distribution of the full data set and
choose appropriate probability functions.
Adding complexity using a hierarchical model:
individual variation in the α’s.
a, b
Hyper-parameters controlling the distribution of α’sx i y i
procc
,
i) a,b (
) (
) ( ) ( ) (
) ,
| (
) ),
, , , (
| (
) ,
| ,
, , (
b P a P
x P
P c P
x b a P
x x
c g
y P
x y
c P
i
i i
i
i i i
x i y i
procc
,
i) a, b 001
,.
001 .
| (
) 001 ,.
001 .
| (
) 001 ,.
001 .
| (
) 001 ,.
001 .
| (
) 001 ,.
001 .
| (
) ,
| (
) ), ,
, , (
| ( ( )
x y,
| , , , (
) ( ) ( ) (
) ( ) ( ) ,
| ( ) ), ,
, , (
| ( )
,
| , , , (
) (
) (
) ) (
, , , (
2
1
b gamma
x a
gamma
x gamma
x c
gamma
x gamma
x b a gamma
x x
c g
y normal c
P
b P a P P
c P P
b a P
x c g
y P x
y c
P
c x
c x x
c g
i
i i
i n
i
i i
i i
i i i
i i
i i i
i i
x
iy
i
procc
,
iJAGS code
) 001 ,.
001 (
~
) 001 ,.
001 (
~
) 001 ,.
001 (.
~
) 001 ,.
001 (.
~
) 001 ,.
001 (
~
/ 1
/ .
var
/ .
}
) , (
~ ] [
) ], [ dnorm(
~ y[i]
x[i]
/ ] [
c - x[i]
] - [
] [
n){
: 1 in (
2
dgamma b
dgamma a
dgamma dgamma c
dgamma
b a alpha
b a alpha
mean
b a dgamma i
i
c i
i i i
for
Roadmap: Models of increasing complexity
• Poisson plots: Individual differences via random effects.
• Light limitation of trees
– Simple non-hierarchical model
– Adding individual differences in max growth rate
• LIDET decomposition: sampling errors in y’s
• Light limitation of trees – Errors in covariates – Adding a treatment
So, what is a “data model”?
Let μi be the true (latent) value of some quantity of interest and let yi be an observation of μi
fd is a data model. In the simplest data model, θdata=σ of the data model, σdata. In this case, we assume there is no bias, that is, on average y= μ.
We also may have something like
θdata=[v,η, σdata] where v and η are parameters in a deterministic model that corrects for bias in the y.
) ,
(
~
d i datai
f
y
Decomposition
We conduct an experiment where we observe 5 replications of disappearance of litter at 10 points in time. Replications could be unbalanced. We want to estimate the parameters in a model of decomposition. Our dependent variable is the true mass of litter at time t. Our model is a simple decaying exponential. We are
confident that there is no bias in the estimates of the mass of litter but there are errors in observations at each time for which we we must account. These errors occur because we get different values when we observe replicates from the same sample.
y
-ktProcess model
Process model with true mass
y
-ktProcess model with true mass and observed mass
We can think of sampling variance and process variance in this way—if we increase the number of blue points (i.e. decrease sampling error), their average would
asymptotically reach the red points. However, no matter how many blue points we observed, the red points would remain some distance away (i.e. process error) from our process model estimates.
• Diagram the knowns and unknowns and write out the posterior and the joint conditional
distribution for a single observation.
Data model
Process model
Parameter model yijis the observed mass remaining.
μi is the true value of mass remaining. It is “latent”, i.e.
not observable.
j
y
i,
i
procM
0, k
obst
iData model
Process model
Parameter model j
y
i,
i
procM
0, k
obst
i) ,
|
( y
i,j i obsP
) ),
, , (
|
(
ig M
0k t
i procP
) (
) ( ) (
)
( P P k P M
0P
proc
obsPutting it all together:
JAGS pseudo code
x gamma
x gamma
x k
gamma
x M
gamma
x k
M normal
x y
normal y
k M P
P M P k P P
t k M g P y
P y
k M P
obs proc
proc i i
i j ij i obs
obs proc
proc obs
proc i
obs i
ij ij
obs proc
i
) 001 ,.
001 .
| (
) 001 ,.
001 .
| (
) 001 ,.
001 .
| (
) 001 ,.
001 .
| (
) ,
,
| (
) ,
| ( )
| ,
, , ,
(
) (
) (
) ( ) (
) ),
, , (
( ) ,
| ( )
| ,
, , ,
(
2 2 0
0 10
1
10 1
5 1 0
0
0 0
Roadmap: Models of increasing complexity
• Poisson plots: Individual differences via random effects.
• Hidden process models – Light limitation of trees
• Simple non-hierarchical model
• Adding individual differences in max growth rate – LIDET decomposition: sampling errors in y’s
– Light limitation of trees
• Errors in covariates
• Adding a treatment
We now want to add a way to represent uncertainty in the
observations of light—in other words, we want to model the data as well as the process. This makes sense—a single measure of light
cannot represent the light that is “seen” by a tree.
For each tree, we take 10 different measurements of light at different points in the canopy. For now, we assume that the growth rates (the y’s) are measured perfectly.
There are n=50 trees with j=10 light observations per tree for a total of n*j=500 observations. We will index the observations within a plant using i[j] to mean tree i containing light measurement j. The total set of light measurements are indexed N=1….n*j. This notation is useful
because the number of measurements per tree can vary.
Accounting for errors in covariates
Data model
Process model
Parameter model
Hyper-parameters
Same as before but assuming perfect
observations of x and y
y
ib a,
ix
ic
,
procData model
Process model
Parameter model
Hyper-parameters
Write out Bayesian model using P() notation.
Xi[j] reads tree i containing
light observation j.
y
ib a,
i] [ j
x
ic
,
procData model
Process model
Parameter model
Hyper-parameters
y
ib a,
i] [ j
x
ic
,
proc) ,
|
( x
i[ j]L
i obs,iP
x L
c g
y
P (
i| (
i, , ,
i),
proc)
L
i
obsx P
P P
c P Li P
x b a P
obs proc
i
) (
) (
) ( ) ( ) (
) ,
| (
) ( )
( a P b
P
What is this?
N is the number of light readings, N is the number of trees
) 001 ,.
001 .
| (
) 001 ,.
001 .
| (
) 001 ,.
001 .
| (
) 001 ,.
001 .
| (
) 001 ,.
001 .
| (
) 001 ,.
001 .
| (
) 001 ,.
001 .
| ( )
,
| (
) ,
| (
) ),
, , (
| ( )
,
| ,
, , , , (
) ( ) ( ) (
) (
) ( ) ( ) ( ) ,
| (
) ,
| (
) ),
, , (
| ( )
,
| ,
, , , , (
2
2 , 1
2 , 2
, 2 ]
[ 1
1
,
, ]
[ ]
[ ,
b gamma
x a
gamma
x gamma
x gamma
x c
gamma
x gamma
L gamma b
a gamma
L x x L
gamma
c g
y normal x
y L
c P
b P a P P
P P
c P L P b a P
L x
P c
g y P x
y L
c P
proc
i obs i
i n
i
i obs
i i
obs i j
i N
i
n
i i i proc
proc obs
i obs proc
i i
i obs i
j i proc
i i
j i i proc i
obs i
i
L L pred
L L pred
L x x L
gamma
i obs
i i
obs i j
i N
i
_ var_
rate
_ var_
shape
MOM using
parameters Calculate
shape/rate var_L
wants) JAGS
(what shape/rate
pred_L
i for tree e
light valu predicted
L
) ,
| (
2 2 i
2 , 2
, 2 ]
1 [
JAGS pseudo-code
) 001 ,.
001 (
~
) 001 ,.
001 (
~
) 001 ,.
001 (.
~
) 001 ,.
001 (.
~
) 001 ,.
001 (
~
/ 1 }
) , (
~ ]
) ],
]
) 001 ,.
. ]
, [
) 001 ,.
. ]
] /
]
] ] ]
(
#
]]
[ [
]]
[ , [
]]
[ [
]]
[
~ [ ] [
(
2
2 2
2
dgamma b
dgamma a
dgamma dgamma c
dgamma
b a dgamma i
i i
i obs i
c i i
i i i
i for
j index
j index L
j index
j index gamma L
j x
j for
proc
obs obs
proc obs obs
[
[ dnorm(
~ y[
001 dgamma(
~
001 dgamma(
~ L[
L[
[
c - L[
- [ [
trees of
number is
n n){#
: 1 in
j.
n observatio containing
i plant for
ID sequential the
is index[j]
}
ts measuremen light
of number is
N N){#
: 1 in
Roadmap: Models of increasing complexity
• Poisson plots: Individual differences via random effects.
• Hidden process models – Light limitation of trees
• Simple non-hierarchical model
• Adding individual differences in max growth rate – LIDET decomposition: sampling errors in y’s
– Light limitation of trees
• Errors in covariates
• Adding a treatment
We now want to estimate the effect of CO
2enrichment on the effect of light on growth. We have two levels of CO
2: ambient (365 ppm) and elevated (565 ppm). We want to examine the effect of treatment on maximum growth rate.
We use an indicator variable u=0 for control and u=1 for ambient. We want to estimate a new parameter, k ,
that represents the additional growth due to CO
2.
Adding an experimental treatment
) (
) (
) ) (
, , , ,
( k L c
c L
u k L c g
i i
i
i i
i i
i
i
Data model
Process model
Parameter model
Hyper-parameters
Write out Bayesian model using P() notation.
Xi[j] reads tree i containing
light observation j.
y
ib a,
i] [ j
x
ik c,
,
procData model
Process model
Parameter model
Hyper-parameters
y
ib a,
i] [ j
x
ic
,
proc) ,
|
( x
i[ j]L
i obs,iP
x L
k c g
y
P (
i| (
i, , , ,
i),
proc)
L
i
obsx P
P P
c P Li P
x b a P
obs proc
i
) (
) (
) ( ) ( ) (
) ,
| (
) ( )
( a P b
P
N is the number of light readings, N is the number of trees
) 001 ,.
001 .
| (
) 001 ,.
001 .
| (
) 001 ,.
001 .
| (
) 001 ,.
001 .
| (
) 001 ,.
001 .
| (
) 001 ,.
001 .
| (
) 001 ,.
001 .
| (
) 001 ,.
001 .
| ( )
,
| (
) ,
| (
) ),
, , , (
| ( )
,
| ,
, , , , , (
) ( ) ( ) (
) (
) ( ) ( ) ( ) ,
| (
) ,
| (
) ),
, , , (
| ( )
,
| ,
, , , , , (
2
2 1 ,
2 , 2
, 2 ]
1 [
1
,
, ]
[ ]
[ ,
b gamma
x a
gamma
x gamma
x k
gamma
x gamma
x c
gamma
x gamma
L gamma b
a gamma
L x x L
gamma
k c g
y normal x
y L
k c P
b P a P P
P P
c P L P b a P
L x
P k
c g
y P x
y L
k c P
proc
i obs i
i n
i
i obs
i i
obs i j
i N
i
n
i i i proc
proc obs
i obs proc
i i
i obs i
j i proc
i i
j i i proc i
obs i
i
How to develop a hierarchical model
• Develop a deterministic model of ecological process This may be mechanistic or empirical. It represents your hypothesis about how the world works.
• Diagram the relationship among data, processes, parameters.
• Using P() notation, write out a data model and process model using the diagram as a guide to conditioning.
• For each P() from above, choose a probability density function to represent the unknowns based on your knowledge on how the data arise. Are they counts? 0-1? Proportions? Always positive? etc…
• Think carefully how to subscript the data and the estimates of your process model. Add product symbols to develop total likelihoods over all the subscripts.
• Think about how the data arise. If you can simulate the data properly, that is all you need to know to do all steps above.