Hierarchical Models

(1)

Hierarchical Models

(2)

Roadmap: Models of increasing complexity

• Poisson plots: Individual differences via random effects.

• Hidden process models – Light limitation of trees

• Simple non-hierarchical model

• Adding individual differences in max growth rate – LIDET decomposition: sampling errors in y’s

– Light limitation of trees

• Errors in covariates

• Adding a treatment

(3)

R.A. Fischer’s ticks

A simple example: We want to know (for some reason) the average number of ticks on sheep. We round up 60 sheep and count ticks on each one. Does a Poisson

distribution fit the data?

) ( )

| ( )

|

(

⁶⁰

1



 y P y P P

i



i





(4)

Data

Parameter

A single mean governs the pattern

y

i



(5)

Exercise: Write out a simple Bayesian model for estimating the average number of ticks per sheep.

How would you estimate that mean?

) ( )

| (

)

| (

) ( )

| (

)

| (

1 1



gamma y

Poisson y

P

P y

P

n

i i

n

i i







(6)

model{

#prior

lambda ~ dgamma(0.001,0.001)

# likelihood for(i in 1:60){

y[i] ~ dpois(lambda) }

(7)

Data

Parameter

Hyperparameter

Each sheep has its own mean (a.k.a. random effect)

y

i



i

 ^

(8)

Hierarchical models have quantities that appear on both sides of the |

Shape and rate must be numbers

yi

i

 

) ,

| ( )

,

| (

) ,

| ( )

| (

)

| , , (

) ( ) ( ) ,

| ( )

| (

)

| , , (

1 1





































rate shape

gamma rate

shape gamma

gamma y

Poisson y

P

P P

P y

P

i i

n

i i

n

i i







(9)

Hierarchical Model

model{

# Priors

a~ dgamma(.001,.001) b~ dgamma(.001,.001) for(i in 1:60){

lambda[i] ~ dgamma(a,b) y[i] ~ dpois(lambda[i]) }

} #end of model

) ( ) (

) ,

| ( )

| , , (

60

1

b P a P

b a P

y P y

b a

P _i

i

i  









(10)

Hidden Processes in Ecology

• The things we are able to observe are not usually what we seek to understand.

• The things we wish to estimate but can’t observe we call latent.

• We assume that the true, latent state gives rise to the observations with some uncertainty—observation error.

• We have a model that predicts the true, latent state imperfectly.

All the things that influence the true latent state that are not represented in our model we treat stochastically in process variance.

• All this means that we need a model for the data (with

uncertainty) and a model for the process (with uncertainty).

(11)

Blueprint for Hierarchical Bayes

• The probabilistic basis for factoring complex models into sensible components via rules of conditioning.

• Directed Acyclic Graphs (DAGs) again.

• Conditioning from DAGs.

• JAGS and MCMC from conditioning.

(12)

Conditioning

Remember from basic laws of probability that:

This generalizes to:

where the components z

_i

may be scalars or subvectors of z and the sequence of their conditioning is arbitrary.

) ( )

| (

) ,

| ( )....

...

| (

) ,...

, ,

) ,...

, ,

) ( )

| (

) ( )

| ( )

, (

1 1

2 2

1 3 1

1 3

2 1

3 2 1

1 1

2 2

2 1 2

1

z p z

z z p z

z z

p z

z z z p

z z

z p z

p z

z p z

z p

n n







(

z

(13)

So what does this mean?

All these are correct. Any quantity that appears on the rhs of | and does not appear in the lhs of | must end up as a P().

) ( ) ( ) ,

| (

) ,

,

| ( )

, ,

| (

) ( ) ( ) ,

| ( ) ,

,

| ( )

, , , (

) ( ) ( ) ,

| ( ) ,

,

| ( )

, , , (

) ( ) ( ) ( ) ,

,

| ( )

, , , (

C P B P C B D

P D C B A P D

C B A P

D P B P D B C

P D C B A P D

C B A P

D P C P D C B

P D C B A P D

C B A P

B P D P C P D C B A P D

C B A P



(14)

A general, hierarchical model for ecological processes

Define:

θ : A vector of parameters that can be decomposed into two additional vectors:

θ ={θ

p

,θ

d

} where θ

d

are parameters in a model of the data and θ

p

are parameters in the model of the process. Both vectors include parameters representing uncertainty.

y: A vector or matrix of data, including responses and covariates.

μ: The unobserved, true state of the system-the size of the

population of sites occupied, the mass of C in a gram of soil,

etc.

(15)

A general, hierarchical model for ecological processes

y

i



i



p



d

) (

)

| )(

,

| (

)

| ,

,

(

_d _p

y P y

_d _p

P

_d

P

_p

P          

(16)

We will make use of the following three laws from probability theory:

B)

| P(A B)

P(A, therefore

), (

)

| ( )

, ( )

(

) C ( ) C ( ) C

| ( ) C ,

| ( )

C , C , , ( )

(

} C , {C C

Let

) ( )

| ( ) ,

| ( )

, , ( )

(

2 1





B P B A P B

A P III

P P

B P B

A P B

A P II

C P C B

P C B A

P C

B

A

P

I

(17)

A general, hierarchical model for ecological processes

) ( ) ( )

| ( ) ,

| ( )

| , , (

: is on distributi posterior

the above, 5

and 4

using Therefore,

)

| , , ( )

, , , ( )

5 (

) ( )

| , , ( )

, , , (

: III Using

) ( ) ( )

| ( ) ,

| ( )

, , , ( )

4 (

: II Using

} , { Let

) ( )

| ( ) ,

| ( )

, , (

: I Using

d p

p d

p d p

d

p d p

d

d p

p d

P P

P y

P

y P

y P y P

y P

P P

P y

P

P P

y P y

P

















































(18)

The data

Predictions of the true state by a process model

Parameters in the process model, including process uncertainty

Parameters in the data model, including observation uncertainty

) (

)

| (

) ,

| ( )

| ,

,

(

_d _p

y P y

_d

P

_p

P

_p

P

_d

P          

(19)

This general set-up provides a basis for Gibbs sampling or JAGS code.

Gibbs sampling: For each step in the chain, we cycle over each parameter and estimate its posterior distribution based on the probability densities where it appears, treating other parameters at their current value at each step in the chain. We sample from the full conditional distribution:

remembering there may be more several more steps because θ_d and θ_p are vectors or parameters.

) (

)

| (

) ,

| ( )

| ,

,

(

_d _p

y P y

_d

P

_p

P

_p

P

_d

P          

) (

) ,

| ( )

,

| (

) (

)

| ( )

| (

)

| (

) ,

| ( )

, ,

| (

d d

d

p p

p

p d

P y

P

P P

P

P y

P























(20)

Roadmap: Models of increasing complexity

• Poisson plots: Individual differences via random effects.

• Hidden process models – Light limitation of trees

• Simple non-hierarchical model

• Adding individual differences in max growth rate – LIDET decomposition: sampling errors in y’s

– Light limitation of trees

• Errors in covariates

• Adding a treatment

(21)

Steps

• Diagram the unknowns and knowns for a single observation (A DAG)

• Using P() notation, write out an expression for the posterior distribution using arrows in the DAG as a guide.

• Chose appropriate distributions for the P()’s. Take products over all observations.

• Implement via MCMC or JAGS.

(22)

Developing a simple Bayesian model for light limitation of trees

μ_i= prediction of growth for tree I x_i= light measured for tree i

ϒ= max. growth rate at high light c=minimum light requirement α=slope of curve at low light

0 1 2 3 4 5 6 7

0.00.20.40.60.81.0

light Availability

Growth Rate

) , (

~

) , ( )

| (

) (

) ) (

, , , (















 





i i

i i i

i

Normal y

p

c L

c x L

c g







 



(23)

A simple Bayesian model

Write out the numerator for Bayes Law

x i ^y ⁱ



c , , 



(24)

A simple Bayesian model

Now write out the full model by choosing appropriate

distributions for the P()’s.

Assume that the y’s can take negative values. There are n observations.

x i ^y ⁱ



c , , 



) ( ) ( ) ( ) (

) ),

, , , (

| (

) ,

| ,

, , (



















P c P P

P

x x

c g

y P

x y c

P

i i

i



(25)

x i ^y ⁱ



c , , 



) 001 ,.

001 .

| (

) 001 ,.

001 .

| (

) 001 ,.

001 .

| (

) 001 ,.

001 .

| (

) ), ,

, , (

| ( ( )

| , , , (

) ( ) ( ) ( ) ( ) ), ,

, , (

| ( )

,

| , , , (

) (

) ) (

, , , (

2

1













 











































 





gamma

x c

gamma

x gamma

x x

c g

y normal c

P

P c P P

P x

c g

y P x

y c

P

c L

c x L

c g

i i

n i

i i

x

y,

(26)

JAGS code

) 001 ,.

001 (.

~

) 001 ,.

001 (.

~

) 001 ,.

001 (

~

) 001 ,.

001 (

~

/ 1 }

) ], [ dnorm(

~ y[i]

x[i]

/

c - - x[i]

] [

n){

: 1 in (

dgamma dgamma c

dgamma dgamma

i i c

i for



















 







 

(27)

Now assume that there is individual variation in the α (max growth rate), such that each α, is drawn from a distribution of α’s. Sketch the DAG; write out the

posterior and the joint conditional distribution using P() notation.

Write out the distribution of the full data set and

choose appropriate probability functions.

(28)

Adding complexity using a hierarchical model:

individual variation in the α’s.

a, b

Hyper-parameters controlling the distribution of α’s

x i ^y ⁱ



proc

c

 ,



i

(29)

) a,b (

) (

) ( ) ( ) (

) ,

| (

) ),

, , , (

| (

) ,

| ,

, , (

b P a P

x P

P c P

x b a P

x x

c g

y P

x y

c P

i

i i

i

i i i

















 

x i ^y ⁱ



proc

c

 ,



i

(30)

) a, b 001

,.

001 .

| (

) 001 ,.

001 .

| (

) 001 ,.

001 .

| (

) 001 ,.

001 .

| (

) 001 ,.

001 .

| (

) ,

| (

) ), ,

, , (

| ( ( )

x y,

| , , , (

) ( ) ( ) (

) ( ) ( ) ,

| ( ) ), ,

, , (

| ( )

,

| , , , (

) (

) ) (

, , , (

2

1

b gamma

x a

gamma

x gamma

x c

gamma

x gamma

x b a gamma

x x

c g

y normal c

P

b P a P P

c P P

b a P

x c g

y P x

y c

P

c x

c x x

c g

i

i i

i n

i

i i

i i i

i i

i i i

i i













 











































 





x

i

^y

ⁱ



proc

c

,



i

(31)

JAGS code

) 001 ,.

001 (

~

) 001 ,.

001 (

~

) 001 ,.

001 (.

~

) 001 ,.

001 (.

~

) 001 ,.

001 (

~

/ 1

/ .

var

/ .

}

) , (

~ ] [

) ], [ dnorm(

~ y[i]

x[i]

/ ] [

c - x[i]

] - [

] [

n){

: 1 in (

2

dgamma b

dgamma a

dgamma dgamma c

dgamma

b a alpha

mean

b a dgamma i

i

c i

i i i

for



















 















 

(32)

Roadmap: Models of increasing complexity

• Poisson plots: Individual differences via random effects.

• Light limitation of trees

– Simple non-hierarchical model

– Adding individual differences in max growth rate

• LIDET decomposition: sampling errors in y’s

• Light limitation of trees – Errors in covariates – Adding a treatment

(33)

So, what is a “data model”?

Let μ_i be the true (latent) value of some quantity of interest and let y_i be an observation of μ_i

f_d is a data model. In the simplest data model, θ_data=σ of the data model, σ_data. In this case, we assume there is no bias, that is, on average y= μ.

We also may have something like

θ_data=[v,η, σ_data] where v and η are parameters in a deterministic model that corrects for bias in the y.

) ,

(

~

_d _i _data

i

f

y  

(34)

Decomposition

We conduct an experiment where we observe 5 replications of disappearance of litter at 10 points in time. Replications could be unbalanced. We want to estimate the parameters in a model of decomposition. Our dependent variable is the true mass of litter at time t. Our model is a simple decaying exponential. We are

confident that there is no bias in the estimates of the mass of litter but there are errors in observations at each time for which we we must account. These errors occur because we get different values when we observe replicates from the same sample.

y

^-kt

(35)

Process model

(36)

Process model with true mass

y

^-kt

(37)

Process model with true mass and observed mass

We can think of sampling variance and process variance in this way—if we increase the number of blue points (i.e. decrease sampling error), their average would

asymptotically reach the red points. However, no matter how many blue points we observed, the red points would remain some distance away (i.e. process error) from our process model estimates.

(38)

• Diagram the knowns and unknowns and write out the posterior and the joint conditional

distribution for a single observation.

(39)

Data model

Process model

Parameter model y_ijis the observed mass remaining.

μ_i is the true value of mass remaining. It is “latent”, i.e.

not observable.

j

y

i_,



i



proc

M

₀

, k



obs

t

i

(40)

Data model

Process model

Parameter model j

y

i_,



i



proc

M

₀

, k



obs

t

i

) ,

|

( y

_i_,_j _i _obs

P  

) ),

, , (

|

(

_i

g M

₀

k t

_i _proc

P  

) (

) ( ) (

)

( P P k P M

₀

P 

_proc



_obs

(41)

Putting it all together:

JAGS pseudo code

x gamma

x k

gamma

x M

gamma

x k

M normal

x y

normal y

k M P

P M P k P P

t k M g P y

P y

k M P

obs proc

proc i i

i j ij i obs

obs proc

proc obs

proc i

obs i

ij ij

obs proc

i

) 001 ,.

001 .

| (

) 001 ,.

001 .

| (

) 001 ,.

001 .

| (

) 001 ,.

001 .

| (

) ,

,

| (

) ,

| ( )

| ,

, , ,

(

) (

) ( ) (

) ),

, , (

( ) ,

| ( )

| ,

, , ,

(

2 2 0

0 10

1

10 1

5 1 0

0

0 0





 



























(42)

Roadmap: Models of increasing complexity

• Poisson plots: Individual differences via random effects.

• Hidden process models – Light limitation of trees

• Simple non-hierarchical model

• Adding individual differences in max growth rate – LIDET decomposition: sampling errors in y’s

– Light limitation of trees

• Errors in covariates

• Adding a treatment

(43)

We now want to add a way to represent uncertainty in the

observations of light—in other words, we want to model the data as well as the process. This makes sense—a single measure of light

cannot represent the light that is “seen” by a tree.

For each tree, we take 10 different measurements of light at different points in the canopy. For now, we assume that the growth rates (the y’s) are measured perfectly.

There are n=50 trees with j=10 light observations per tree for a total of n*j=500 observations. We will index the observations within a plant using i[j] to mean tree i containing light measurement j. The total set of light measurements are indexed N=1….n*j. This notation is useful

because the number of measurements per tree can vary.

Accounting for errors in covariates

(44)

Data model

Process model

Parameter model

Hyper-parameters

Same as before but assuming perfect

observations of x and y

y

i

b a,



i

x

i

c

 , 

_proc

(45)

(46)

Data model

Write out Bayesian model using P() notation.

Xi[j] reads tree i containing

light observation j.

y

i

b a,



i

] [ j

x

i

c

 , 

_proc

(47)

Data model

y

i

b a,



i

] [ j

x

i

c

 , _

_proc

) ,

|

( x

_i_[ _j_]

L

_i _obs_,_i

P 

x L

c g

y

P (

_i

| ( 

_i

,  , ,

_i

), 

_proc

)

L

i



obs

x P

P P

c P Li P

x b a P

obs proc

i

) (

) ( ) ( ) (

) ,

| (







) ( )

( a P b

P

(48)

What is this?

N is the number of light readings, N is the number of trees

) 001 ,.

001 .

| (

) 001 ,.

001 .

| (

) 001 ,.

001 .

| (

) 001 ,.

001 .

| (

) 001 ,.

001 .

| (

) 001 ,.

001 .

| (

) 001 ,.

001 .

| ( )

,

| (

) ,

| (

) ),

, , (

| ( )

,

| ,

, , , , (

) ( ) ( ) (

) (

) ( ) ( ) ( ) ,

| (

) ,

| (

) ),

, , (

| ( )

,

| ,

, , , , (

2

2 , 1

2 , 2

, 2 ]

[ 1

1

,

, ]

[ ]

[ ,

b gamma

x a

gamma

x gamma

x c

gamma

x gamma

L gamma b

a gamma

L x x L

gamma

c g

y normal x

y L

c P

b P a P P

P P

c P L P b a P

L x

P c

g y P x

y L

c P

proc

i obs i

i n

i

i obs

i i

obs i j

i N

i

n

i i i proc

proc obs

i obs proc

i i

i obs i

j i proc

i i

j i i proc i

obs i

i















































(49)

L L pred

L x x L

gamma

i obs

i i

obs i j

i N

i

_ var_

rate

_ var_

shape

MOM using

parameters Calculate

shape/rate var_L

wants) JAGS

(what shape/rate

pred_L

i for tree e

light valu predicted

L

) ,

| (

2 2 i

2 , 2

, 2 ]

1 [







 

(50)

JAGS pseudo-code

) 001 ,.

001 (

~

) 001 ,.

001 (

~

) 001 ,.

001 (.

~

) 001 ,.

001 (.

~

) 001 ,.

001 (

~

/ 1 }

) , (

~ ]

) ],

]

) 001 ,.

. ]

, [

) 001 ,.

. ]

] /

]

] ] ]

(

#

]]

[ [

]]

[ , [

]]

[ [

]]

[

~ [ ] [

(

2

2 2

2

dgamma b

dgamma a

dgamma dgamma c

dgamma

b a dgamma i

i i

i obs i

c i i

i i i

i for

j index

j index L

j index

j index gamma L

j x

j for

proc

obs obs

proc obs obs





















 









 













[

[ dnorm(

~ y[

001 dgamma(

~

001 dgamma(

~ L[

L[

[

c - L[

- [ [

trees of

number is

n n){#

: 1 in

j.

n observatio containing

i plant for

ID sequential the

is index[j]

}

ts measuremen light

of number is

N N){#

: 1 in

(51)

Roadmap: Models of increasing complexity

• Poisson plots: Individual differences via random effects.

• Hidden process models – Light limitation of trees

• Simple non-hierarchical model

• Adding individual differences in max growth rate – LIDET decomposition: sampling errors in y’s

– Light limitation of trees

• Errors in covariates

• Adding a treatment

(52)

We now want to estimate the effect of CO

₂

enrichment on the effect of light on growth. We have two levels of CO

₂

: ambient (365 ppm) and elevated (565 ppm). We want to examine the effect of treatment on maximum growth rate.

We use an indicator variable u=0 for control and u=1 for ambient. We want to estimate a new parameter, k ,

that represents the additional growth due to CO

2

.

Adding an experimental treatment

(53)

) (

) ) (

, , , ,

( k L c

c L

u k L c g

i i

i

i i

i

  



 

  

 





(54)

Data model

Write out Bayesian model using P() notation.

Xi[j] reads tree i containing

light observation j.

y

i

b a,



i

] [ j

x

i

k c,

 , ^

^proc

(55)

Data model

y

i

b a,



i

] [ j

x

i

c

 , _

_proc

) ,

|

( x

_i_[ _j_]

L

_i _obs_,_i

P 

x L

k c g

y

P (

_i

| ( 

_i

,  , , ,

_i

), 

_proc

)

L

i



obs

x P

P P

c P Li P

x b a P

obs proc

i

) (

) ( ) ( ) (

) ,

| (







) ( )

( a P b

P

(56)

N is the number of light readings, N is the number of trees

) 001 ,.

001 .

| (

) 001 ,.

001 .

| (

) 001 ,.

001 .

| (

) 001 ,.

001 .

| (

) 001 ,.

001 .

| (

) 001 ,.

001 .

| (

) 001 ,.

001 .

| (

) 001 ,.

001 .

| ( )

,

| (

) ,

| (

) ),

, , , (

| ( )

,

| ,

, , , , , (

) ( ) ( ) (

) (

) ( ) ( ) ( ) ,

| (

) ,

| (

) ),

, , , (

| ( )

,

| ,

, , , , , (

2

2 1 ,

2 , 2

, 2 ]

1 [

1

,

, ]

[ ]

[ ,

b gamma

x a

gamma

x gamma

x k

gamma

x gamma

x c

gamma

x gamma

L gamma b

a gamma

L x x L

gamma

k c g

y normal x

y L

k c P

b P a P P

P P

c P L P b a P

L x

P k

c g

y P x

y L

k c P

proc

i obs i

i n

i

i obs

i i

obs i j

i N

i

n

i i i proc

proc obs

i obs proc

i i

i obs i

j i proc

i i

j i i proc i

obs i

i















































(57)

How to develop a hierarchical model

• Develop a deterministic model of ecological process This may be mechanistic or empirical. It represents your hypothesis about how the world works.

• Diagram the relationship among data, processes, parameters.

• Using P() notation, write out a data model and process model using the diagram as a guide to conditioning.

• For each P() from above, choose a probability density function to represent the unknowns based on your knowledge on how the data arise. Are they counts? 0-1? Proportions? Always positive? etc…

• Think carefully how to subscript the data and the estimates of your process model. Add product symbols to develop total likelihoods over all the subscripts.

• Think about how the data arise. If you can simulate the data properly, that is all you need to know to do all steps above.

These steps are not rigidly sequential