Exercises - Model-based Geostatistics Peter J. Diggle and Paulo J. Ribeiro Jr. November 7, 2006

3.1. Consider a one-dimensional spatial process S(x) : x ∈ IR with mean µ, variance σ² and correlation function ρ(u) = exp(−u/φ). Define a new processR(x) :x∈IR by the equation

R(x) = (2θ)⁻¹ Z _x+θ

x−θ

S(u)du.

Derive the mean, variance and correlation function of R(·). Comment briefly.

3.14. Exercises 77 3.2. Is the following a legitimate correlation function for a one-dimensional

spatial processS(x) :x∈IR?

ρ(u) =

½ 1−u : 0≤u≤1 0 : u >1 Give either a proof or a counter-example.

3.3. Derive a formula for the volume of the intersection of two spheres of equal radius,φ, whose centres are a distanceuapart. Compare the result with the formula (3.8) for the spherical variogram and comment.

3.4. Consider the following method of simulating a realisation of a one-dimensional spatial process onS(x) :x∈IR, with mean zero, variance 1 and correlation functionρ(u). Choose a set of pointsxi∈IR :i= 1, . . . , n.

Let R denote the correlation matrix of S = {S(x1), . . . , S(xn)}. Obtain the singular value decomposition ofR as R=DΛD⁰ where λis a diago-nal matrix whose non-zero entries are the eigenvalues ofR, in order from largest to smallest. LetY ={Y1, . . . , Yn}be an independent random sam-ple from the standard Gaussian distribution, N(0,1). Then the simulated realisation is

S=DΛ¹²Y. (3.35)

Write anRfunction to simulate realisations using the above method for any specified set of pointsx_i and a range of correlation functions of your choice. Use your function to simulate a realisation of S on (a discrete approximation to) the unit interval (0,1).

Now investigate how the appearance of your realisation S changes if in (3.35) you replace the diagonal matrix Λ by a truncated form in which you replace the lastkeigenvalues by zeros.

3.5. Consider a spatial processS(·) defined by S(x) =

w(u)S^∗(x−u)du

wherew(u) = (2π)⁻¹exp(−||u||²/2) andS^∗(·) is another stationary Gaus-sian process. Derive an expression for the correlation function,ρ(u) say, ofS(·) in terms of w(·) and the correlation function,ρ^∗(u) say, of S^∗(·).

Give explicit expressions forρ(u) whenρ^∗(u) is of the form:

(a) pure nugget, ρ^∗(u) = 1 ifu= 0, zero otherwise;

(b) spherical;

(d) In each case, comment on the mean square continuity and differen-tiability properties of the processS(·) in relation to its corresponding S^∗(·).

Printer: Opaque this

4

Generalized linear models for geostatistical data

4.1 General formulation

In the classical setting of independently replicated data, the generalized linear model (GLM) as introduced by Nelder and Wedderburn (1972) provides a uni-fying framework for regression modelling of continuous or discrete data. The original formulation has since been extended, in various ways, to accommodate dependent data. In this chapter we enlarge on the brief discussion of Section 1.4 to consider extensions of the classical GLM which are suitable for geostatistical applications.

The basic ingredients of a GLM are the following:

1. responses Yi : i = 1, . . . , n are mutually independent with expectations µi;

2. theµ_iare specified byh(µ_i) =η_i, whereh(·) is a knownlink functionand ηi is alinear predictor,ηi =d⁰_iβ; in this last expression, di is a vector of explanatory variables associated with the responseYiandβ is a vector of unknown parameters;

3. the Yi follow a common distributional family, indexed by their expec-tations, µi, and possibly by additional parameters common to all n responses.

Working within this framework, Nelder and Wedderburn (1972) showed how a single algorithm could be used for likelihood-based inference. This enabled the development of a single software package, GLIM, for fitting any model within the GLM class. The fitting algorithm was subsequently incorporated into many

4.1. General formulation 79 general-purpose statistical packages, including the glm() function within R.

GLM’s occupy a central place in modern applied statistics.

One of a number of ways to extend the GLM to accommodate dependent responses is to introduce unobservablerandom effectsinto the linear predictor.

Thus, in the second part of the model specification above,ηi is modified to ηi=d⁰_iβ+Si

where nowS = (S1, . . . , Sn) follows a zero-mean multivariate distribution. The Si are called random effectsor latent variables. Models of this kind are called generalized linear mixed models (GLMM’s). Breslow and Clayton (1993) give further details and a range of applications. In practice, the most common spec-ification forS is as a multivariate Gaussian random variable with a particular covariance structure imposed according to the practical context.

In a GLMM, the simplest assumption we could make about the Si is that they are mutually independent, in which case the model is sometimes said to incorporate extra-variation, or over-dispersion, relative to the corresponding classical GLM. For example, when a Poisson log-linear model is fitted to inde-pendent count data, it is often found that in an otherwise well-fitting model the variance is larger than the mean, whereas the Poisson assumption implies that they should be equal. A GLMM with mutually independent Si is one of several ways to account for this effect, which is often calledextra-variationor over-dispersion. To model dependent data using a GLMM, we need to specify a suitable form of dependence amongst the Si. For example, in longitudinal studies where theYi arise as repeated measurements taken from many different individuals, it is usual to assume that theSi are independent between individ-uals but correlated within individindivid-uals. The statistical methods associated with models of this kind can exploit the independent replication between individuals in order to check directly any assumed form for the correlation structure within subjects, or to develop methods of analysis which are in some respects robust to mis-specification of the correlation structure. See, for example, Diggle, Hea-gerty, Liang and Zeger (2002), in particular their discussion ofmarginal models for longitudinal data.

For geostatistical applications, we usually cannot rely on any form of inde-pendent replication. Instead, the observed responsesy = (y1, . . . , yn) must be considered as a single realisation of ann-dimensional random variableY. In this setting, we shall use GLMM’s in whichSequates toS={S(x1), . . . , S(xn)}, the values of an underlying Gaussian signal process at each of the sample locations xi. This very natural extension of GLMM’s was investigated systematically by Diggle et al. (1998). We shall refer to a model of this kind as a generalized linear geostatistical model, or GLGM. This is not the only way in which we could adapt the classical GLM for use in geostatistical applications, but it is the approach on which we shall focus most of our attention.

The generalized linear modelling strategy is most appealing when the dis-tributional family for the responsesYi, conditional on the random effectsS in the case of a mixed model, follows naturally from the sampling mechanism.

For this reason, two of the most widely used GLM’s are the Poisson log-linear

model for count responses, and the logistic-linear model for binary, or more generally binomial, responses. For geostatistical applications, the same philos-ophy applies. In particular, we advocate the use of GLGM’s only as a way of incorporating explicit knowledge of the sampling mechanism which generates the data. When the need is to address empirical departure from linear Gaussian assumptions, for example when continuous-valued measurement data exhibit a strongly skewed distribution, our preferred initial modelling framework would be the transformed Gaussian model as discussed in Chapter 3.

In the remainder of this chapter, we first consider the form of the theoretical variogram for a stationary GLGM. This gives some insight into the statisti-cal properties of this class of models, but can also be helpful for exploratory data analysis using the empirical variogram. We then describe the two most widely used examples of GLGM’s, namely the Poisson log-linear and the bino-mial logistic-linear, followed by a short discussion of spatial models for survival data. We describe some of the connections between GLGM’s and spatial point process models, including the log-Gaussian Cox Process (Møller, Syversveen and Waagepetersen, 1998) and a possible approach to dealing with preferen-tially sampled geostatistical data. We end the chapter with some examples of spatially continuous models which fall outside the GLGM class.

No documento Model-based Geostatistics Peter J. Diggle and Paulo J. Ribeiro Jr. November 7, 2006 (páginas 87-91)