• Nenhum resultado encontrado

A Latent Approach to the Statistical Analysis of Space-time Data

N/A
N/A
Protected

Academic year: 2022

Share "A Latent Approach to the Statistical Analysis of Space-time Data"

Copied!
45
0
0

Texto

(1)

A Latent Approach to the Statistical Analysis of Space-time Data

Dani Gamerman

Instituto de Matemática

Universidade Federal do Rio de Janeiro Brasil

http://acd.ufrj.br/~dani

17th International Workshop on Statistical Modelling Chania, Crete, Greece, 8-12 July 2002

(2)

1990 1974

Played in Europe

World Cup Algorithm

?

2006 1958

Played in Europe

2002 1962

Played in exotic place

1970

1994

Played in (L) America

1986 1978

Played in L. America

1966

1998

 One-time only home win Europe  25 miles apart

Center of the world football

1982

(3)

A Latent Approach to the Statistical Analysis of Space-time Data

Dani Gamerman

Instituto de Matemática

Universidade Federal do Rio de Janeiro Brasil

http://acd.ufrj.br/~dani

Joint work with

Marina S. Paez (IM-UFRJ) Flavia Landim (IM-UFRJ) Victor de Oliveira (Arkansas)

Alan Gelfand (Connecticut) Sudipto Banerjee (Minnesota)

17th International Workshop on Statistical Modelling

(4)

Introduction

Examples:

1) measurements of pollutants in time over a set of monitoring stations

3) counts of morbidity/mortality events in time over a collection of geographic regions

Environmental science – data in the form

of a collection of time series that are geographically referenced.

Some examples can be found in other areas

2) selling price of properties around a neighborhood of interest

(5)

Main Objective: spatial interpolation

Example: Pollution in Rio de Janeiro

Paez, M.S. and Gamerman, D. (2001). Technical report. Statistical Laboratory, UFRJ.

(6)

Example: Pollution in Rio de Janeiro Prob ( PM10 > 100 g/m3 | Yobs )

Other features of interest can be obtained Picture showed mean interpolated values

(7)

Spatial Interpolation

m = number of observations g = number of grid points s1, ... ,sm = observed sites

s1n,...,sgn = grid points (to interpolate)

Y1n,...,Ygn = observations in the grid points

(8)

p Y Y p Y d Y

Y

p (

n

|

obs

) (

n

|

obs

, ) ( |

obs

)

 - all model parameters

Ymis - missing data, treated as parameters

1. Frequentist inference: generate Yn from

p ( Y

n

| Y

obs

,   )

• Obtain P(Yn|Yobs)by simulation.

Steps to generate from Yn|Yobs : If  with probability 1 then

) ,

| (

)

|

( Y

n

Y

obs

p Y

n

Y

obs

*

p

2. Bayesian inference: i ) generate  from ii ) generate Yn from

)

|

( Y

obs

p

) ,

|

( Y

n

Y

obs

p

Interpolation p ( Y

n

| Y

obs

)

(9)

Usual simplifications:

where

 = ( 1, ... , n ) with i=E[w(si)] and

 = (ij [w(si), w(sj)] )i,j

Gaussian Process (GP)

(or Gaussian Random Field)

S - region of Rp (in general, p=2) { w(s) : s S } is a GP if

n, s1 , ... , sm  S

( w(s1) , ... , w(sn) ) ~ Nn (, )

2) Homoscedasticity i = , i Notation: w(.) ~ GP((.),,)

1) Isotropy [w(si),w(sj)]= (hij) with hij=|si– sj|

(10)

Statistical Analysis

Starting point: regression models Yt(s) = t(s) + e t(s) where

t(s) = 0 + 1 X t1(s) + ... + pXtp(s) and et(s) ~ N(0, e2) independent

Suppose that Xtj(s) handles temporal autocorrelation Otherwise, we can include a temporal component t

Usually et(s) remains spatially correlated In this case, et(s) = e0(s) + et1(s)

e0(s)  errors spatially correlated et1(s)  pure residual (white noise)

 0(s) = 0 + e0(s)

(11)

Inference

1. At first (3 steps)

• How to estimate 0(s) ?

Traditional approach: geostatistical 0(.) ~ GP(0,,) or

e0(.) = 0(.)  0 ~ GP(0,,)

(b) e2, 2 and 0 estimated from rt0(s)

ˆ0

ˆ and

ˆ ,

e

(c) Inference based on

(a) 0 , 1 , ... , p estimated in the regression model and the residuals rt0(s) = Yt (s)  t(s) are constructed

then, 0obs ~ N(0 1, , R)

0obs = (0(s1) , ... , 0(sm) )

Hiperparameters: e2, 2 and 0

(12)

3) Natural solution (Kitanidis, 1986; Handcock & Stein, 1993):

• specify distribution for 0

• perform Bayesian inference Problems:

(a) rt0(s)  et (s)

(b) (ˆ e,ˆ,ˆ0) (e,,0)

2) next step:

• 0 , 1 , ... , p and estimated jointly  solves (a)

• but to incorporate uncertainty about is complicated

, 0

,

e

ˆ0

ˆ, ˆ ,

e

(13)

Recall: E[Yt(s)]=0(s) + 1Xt1(s) + ... + pXtp(s)

Spatial heterogeneity doesn’t have to be restricted to 0 Model generalization

Example:

site by site effect of temperature in the Rio pollution data

(14)

Extension of the previous model

E [Yt(s)] = 0(s) + 1(s)Xt1(s) + ... + p(s)Xtp(s) previous model

E [Yt(s)] = 0(s) + 1 Xt1(s) + ... + p Xtp(s)

Hyperparameters: = (ewhere  = (0, 1,..., p) Special cases for the j(.)´s:

One possibility: (.) ~ GP(, , )

a) prior independence

) ,...,

(

20 2p

diag  

(.)) (.),...,

( 

0

p

b) same spatial structure and prior correlation between the j(.)´s

j

   j

 (.) (.),

We can accommodate spatial variation for other coefficients j, j=1, ... , p.

(.) = (0(.), 1(.),..., p(.))

(15)

How to estimate j(s), j=0,1,...,p ?

2) natural solutions:

Specify prior distribution for 

In general, independent and non informative priors are used Problems (the same as before):

(a) bj(s)  j(s)

(b) (ˆ,ˆe,ˆ,ˆ) (,e,,)

1) classical solution (Oehlert, 1993; Solna & Switzer, 1996):

(a) 0 (s), 1 (s), ... , p (s) estimated by

b0(s), b1(s), ... , bp (s) in the local regression model (b) estimated from the bj(s)

(c) inference based on ,e,, ˆ,ˆ e,ˆ,ˆ

(16)

Model Summary

Parameters: obs , where  = ( ,e2, ,)

jobs = (j(s1) , ... , j(sm) ), j=0, 1, ... , p

obs = (0obs , ... , pobs )

= ( 0 , 1 , ... , p )

Data: Yobs = (Y1(s1) , ... , YT(sm)) Xobs = (X1(s1) , ... , XT(sm))

(17)

Simulated data

Yt(s) = t(s) t(s), t=1,...,30 t(s) = 0(s)+ 1(s) Xt(s)

t(s) ~ N(0, e2) independent with e2=1

0 ~ N(, ,(

1 ~ N(, ,(

Xt(s)~ N(, ,(, for all time t

Exponential correlation functions: j(x)exp{- j x}

0= 100 1= 5 2= 0

0= 0.4 1= 0.8 2= 1.5 02= 0.1 12= 1 22= 0.333

(18)

+

=

+

0

1X

0.1 0.3 0.5 0.7 0.9

Coordenada 1 0.1

0.3 0.5 0.7 0.9

Coordenada 2

-2.4 -2.4

-2.4

-1.7

-1.7

-1.7 -1.7

-1.7

-1.7 -0.9

-0.9 -0.9

-0.9

-0.9 -0.9

-0.9

-0.2

-0.2 -0.2

-0.2

-0.2

-0.2 -0.2 -0.2

-0.2

-0.2 -0.2

-0.2 -0.2

0.5 0.5

0.5

0.5 0.5

0.5

0.5 0.5

0.5

0.5 0.5

0.5

1.3 1.3

1.3

1.3 2.0

2.0

2.0 2.0

2.7 2.7

Y

Simulated Data

(19)

Inference

Parameters: (obs ,)

= ( ,e2, ,) Likelihood:

L(obs ,) = p(Yobs | obs , e2 ) Prior:

p(obs ,)= p( obs | ) p() p(e2) p() p() Posterior:

(obs ,)  L (obs ,)  p(obs ,)

• Many parameters

• Complicated functional form

• Solution by MCMC

(20)

 again, use jobs as data (geostatistical analysis) (c) [ e2 | rest ] ~ [ e2 | Yobs , obs ]

~ Inverse Gamma

Full Conditionals

(a) [ obs | rest ] ~ Normal (b) [  | rest] ~ Normal

(e) [  | rest ] ~ j p(j | jobs , j, ) p()

 use jobs as if they were data

 hard to sample  Metropolis - Hastings (d) [  |rest ] ~ [  | obs ,  , 

 Inverse Wishart

(21)

Results

(based on a regular grid of m=25 sites) Histogram of the parameters

i = i-2

(22)

Spatial Interpolation

Interpolation grid: s1n , ... , sgn

jn = (j(s1n) , ... , j(sgn) ), j=0, 1, ... , p

n = (0n , ... , pn )

We need to obtain the interpolation of j´s to interpolate Yn

(23)

Interpolation of Y´s

(Yn,n,| Yobs) = (Yn|n, , Yobs) (n,| Yobs) = (Yn| n ,) (n,| Yobs)

Simulation of [Yn |Yobs] also in 2 steps:

(a) [ n, | Yobs ]  MCMC and Spatial Interpolation (b) [ Yn| n ,]  using Multivariate Normal

Spatial Interpolation of ´s

(n,obs,| Yobs) = ( n | obs, , Yobs) ( obs, | Yobs) = ( n | obs ,) ( obs, | Yobs)

Simulation of [ n | Yobs ] in 2 steps:

(a)[ obs, | Yobs ]  using MCMC

(b)[ n | obs ,]  using Multivariate Normal

(24)

Simulated data: Interpolation of 1 Simulated values

Interpolated values

0.1 0.3 0.5 0.7 0.9

Coordenada 1 0.1

0.3 0.5 0.7 0.9

Coordenada 2

0.1 0.3 0.5 0.7 0.9

Coordenada 1 0.1

0.3 0.5 0.7 0.9

Coordenada 2

(25)

0.1 0.3 0.5 0.7 0.9 0.1

0.3 0.5 0.7 0.9

Coordenada 2

Simulated data: Interpolation of Y30(.) Simulated values

Interpolated values

0.1 0.3 0.5 0.7 0.9

Coordenada 1 0.1

0.3 0.5 0.7 0.9

Coordenada 2

(26)

Interpolation of X´s

These interpolations assume that the interpolated covariates Xj are available for j=1, ... , p

Otherwise, we must interpolate them

(27)

Simulation of [Xn|Yobs,Xobs] in 2 steps:

(a) [x | Xobs ]  MCMC

(b) [Xn| x, Xobs ]  using Multivariate Normal Model may be completed with

X(.) | x , x , x ~ GP(x, x ,x(.))

(Xn, x | Yobs , Xobs) = (Xn , x| Xobs )

= (Xn| x, Xobs) (x | Xobs )

(28)

Results obtained by interpolating X Histogram of the parameters

Precisions less sparse then when X is known

(29)

Interpolation of X30(.)

0.1 0.3 0.5 0.7 0.9

lat 0.1

0.3 0.5 0.7 0.9

long

0.1 0.3 0.5 0.7 0.9

Coordenada 1 0.1

0.3 0.5 0.7 0.9

Coordenada 2

Simulated values

Interpolated values

(30)

Interpolation of Y30(.)

Known X

Unknown X

0.1 0.3 0.5 0.7 0.9

Coordenada 1 0.1

0.3 0.5 0.7 0.9

Coordenada 2

0.1 0.3 0.5 0.7 0.9

Coordenada 1 0.1

0.3 0.5 0.7 0.9

Coordenada 2

113.2

(31)

Application to the pollution data

t(s) independents N(0,2)

0~ N(, ,(.

1~ N(, ,(.

i(., i=0,1 are exponential correlation functions Yt (s) = square root of PM10 at site s and time t Xt = (MON, TUE, WED, THU, FRI, SAT)

Yt(s) = 0 (s)+ 1(s)TEMPt + ´ Xt t(s)

Now, the temperature coefficient varies in space

(32)

Results for the pollution data in Rio

Histogram of the hiperparameters sample

where i = i -2

(33)

Interpolation of the coefficient Prior for 

G(10,10c)

SSDE = 0.0637

G(10-3,10-3)

SSDE = 0.1444

m

i

ols i obs

i

Y

s m

1

E

}

2

] ˆ

| ) ( [ 1 {

where SSDE =

Same idea can be used for (explanatory geostatistical analysis)

c  obtained by exploratory analysis site by site (OLS)

(34)

Yt(s)= t(s) + et(s) where

t(s)=t0(s)+t1(s)Xt1(s)+...+tp(s)Xtp(s) et(s) ~ N(0, e2) independent

Extension of the previous model Yt(s)= t(s) + et(s) where

t(s)= 0(s)+ 1(s)Xt1(s)+...+ p(s)Xtp(s) et(s) ~ N(0, e2) independent

previous model

Natural specification

t(.) | t ~ GP(t , ,), independent in time The model must be completed with:

(a) prior for (e ,  ,  as before

(b) specification of the temporal evolution of the t´s

We can also accommodate temporal variation of the coefficients

j, j=0,...,p.

(35)

Suggestion - use dynamic models (SVP/TVM) (Landim & Gamerman, 2000)

t | t-1 ~ N( Gt t-1 , Wt )

  unknown parameters of the  evolution Model parameters: obs , , 

 = ( 0 , e2 ,  ,  , W ) where

 = ( 1, ... , T) and

t = ( t0 , t1, ... , tp ), t=1, ... , T Simulation cycle has 2 changes:

I) additional step to  II) modified step to 

(36)

Application to simulated data

Yt (s) = t0 (s) + t1(s)Xt1(s) + t(s)

t(.) ~GP (t,  ,)

t = t-1 + t

same spatial correlation to 0 and 1

(. exponential correlation function with = 1.

Histogram of the posterior of 

Multivariate observations: Yt (s) = (Yt1 (s), Yt2 (s))

(37)

Trajectory of (t) - mean and credibility limits

(38)

Interpolation

Samples from ytn|yobs are obtained through the algorithm below:

1. Sample from tobs, yobs - through MCMC

2. Sample from tntobs- through Gaussian process 3. Sample from ytntn - Independent Normal draws

Once again, Xtn must be known, otherwise, they will have to be interpolated.

(39)

Spatially- and time-varying parameters (STVP)

Not separable at the latent level, unlike SVP/TVM

Another possibility: temporal evolution applied directly to the t processes rather than to their means

Yt (s) = t0 (s) + t1(s)Xt1(s) + t(s)

t(.) = t-1(.) + wt(.)

wt(.) ~ GP (t, ,) independent in time

SVP/TVM:

|) (|

) (

] } ,

min{

[ )]

( ),

(

cov[

1 2 1 2 1 2 1 2

2

1

s

t

s R t t I t t s s

t

       

Completed with: 0(.) = 0 ~ N(g0,R)

STVP:

|)]

(|

}[

, min{

)]

( ),

(

cov[ 

t

s

1

t

s

2

Rt

1

t

2

W   

s

1

s

2

Marginal Prior:t(.)| t ~ GP (t, t ,)

(40)

Computations

MCMC algorithm must explore the correlation structures

 parameters are visited in blocks

(Landim and Gamerman, 2000; Fruhwirth-Schnatter, 1994)

Based on the forecast distribution of YT+h|Yobs,

for YT+h = (YT+h(s1f ),..., YT+h(sFf )), and any collection (s1f,..., sFf)

1. Sample from Tobs, Yobs - through MCMC

2. Sample from T+hfTobs- obtained by introduction of T+hobs

3. Sample from YtnTn - Independent normal draws

Tobs T+hobs by successive evolution of the process

T+hobs  T+hfby interpolation with gaussian process

Prediction

(41)

Time-varying locations

Assume locations st = (st1,..., stnt) at time t

tobs is a nt-dimensional vector, t = 1,...,T

1 1

1 1

) ~ ,

~ | , (

) ,

|

( 

obst

obst

   p

obst

t

tobs

d

t

p

1 1

1 1

) ~ ,

~ | ( )

~ ,

|

(  

 

 

  p

obst t

p

t obst

d

t

Both densities in the integrand are multivariate normal

The convolution of these two densities can be shown to be normal and required evolution equation for can be obtained SVP/TVM: Easily adapted

STVP: requires introduction of for updated locations

~

1

t

(42)

Non-Gaussian Observations

Two distinct types of non-normality data:

Count data:

Continuous:

Can be normalized after suitable transformation g(y) Example: Rio pollution data

( g ( Y )  Y )

estimated jointly with other model parameters (de Oliveira, Kedem and Short, 1997)

For example, in the bernoulli or poisson form standard approach: yt(s) ~ EF(t(s))

spatio-temporal modeling issues: similar computations: harder

(43)

Non-Gaussian Evolution

Abrupt changes in the process  normality is not suitable Robust alternative:

wt(.) ~ GP(t,,) is replaced by

wt(.)| t ~ GP(t,t-1, ) and t ~ G(t, t), independent for t=1,...,T

Therefore, wt(.) ~ tP(t,,)

t’s control the magnitude of the evolution

(44)

Final Comments

• More flexibility to accommodate variations in time and space.

• Static coefficient models: samples from the posterior were generated in the software BUGS, with interpolation made in FORTRAN

• Extension to accommodate anisotropic processes to some components of the model.

• Extensions to observations in the exponential family and estimation of the normalizing transformation.

(45)

A Latent Approach to the Statistical Analysis of Space-time Data

Dani Gamerman

Instituto de Matemática

Universidade Federal do Rio de Janeiro Brasil

http://acd.ufrj.br/~dani

17th International Workshop on Statistical Modelling Chania, Crete, Greece, 8-12 July 2002

Referências

Documentos relacionados

Somos levados a admitir que o critério das exigências de justiça é aplicável ao direito processual penal, na medida em que constitui justa causa para a captação de imagens e

Quando se pensa que uma blennorragia muito normal na sua evolução e sem ter sido tratada por nenhum rem.ea.io fulminante, pode deixar depois d'ella exsudâtes

Analyzing control variables that can also contribute to the firm’s value, the following results were found: Log of Total Assets (Log TotA) has a negative sign as in

To do so, we used a comparative analysis of ecosystems in a space-for-time substitution approach to predict the consequences of reservoirs water volume reduction on the

Figure 2 shows the time resolution for High and Medium gain cells with |η| < 0.4, as a function of the energy in the second-layer calorimeter cell used to calculate the time for

Study Year Report 13, University of Warwick. Linear Bayesian methods. Royal Statistical Soe. The Econometric Analysis of Time Series. unified view of statistical

Simulation results with metrics such as number of signaling events (policy requests from the client), policy data exchanged and network access type were evaluated. It was noted

We differentiate between nonlinear physiological-based models, such as the Bal- loon model (Buxton, Wong and Frank, 1998; Friston et al., 2000; Riera et al., 2004), which describes