Computation - Model-based Geostatistics Peter J. Diggle and Paulo J. Ribeiro Jr. November 7, 20

4.5.2 Random sets

A random set (Matheron, 1971a) is a partition of a spatial regionA into two sub-regions according to the presence or absence of a particular phenomenon, so defining a binary-valued stochastic process S(x). A point process can be considered as a countable random set, but the term is usually applied to spatially continuous phenomena, for example a partition of a geographical area into land and water. A widely used model is the Boolean model (Serra, 1980), in which the random set is constructed as the union of a basic set, such as a disc, translated to each of the points of a homogenous Poisson process.

Random sets have developed an extensive theory and methodology in their own right. Matheron (1971a) is an early account of a theory of random sets.

Serra (1982) is a detailed account of theory and methods. A very extensive body of work under the heading of stereology is concerned essentially with the analysis of random sets in three spatial dimensions which are sampled using two-dimensonal sections or one-dimensional probes (Baddeley and Vedel Jensen, 2005). For further discussion and references, see also Cressie (1993, chapter 9) or Chil`es and Delfiner (1999, section 7.8).

4.6 Computation

4.6.1 Simulating from the generalised linear model

Poisson model

Below, we give the sequence of commands for simulating from the Poisson log-linear model as shown in Figure 4.5. We first define the objectcpto contain the coordinates of the required data locations. Next we use the functiongrf()to simulate a realisation of the Gaussian process at these locations withµ= 0.5, σ²= 2 and Mat´ern correlation function withκ= 1.5,φ= 0.2. We then store the Gaussian data in the objects; in Figure 4.5, these values are represented by the grey-scale shading of the grid squares. Next, we exponentiate the realised values of the Gaussian process to define the Poisson means. These are then passed to the functionrpois()to simulate the conditionally independent Poisson counts.

The simulated counts are indicated by the numbers shown in Figure 4.5. The spatially discrete representation of the underlying signalSin Figure 4.5 gives an alternative way of visualising the simulated data, instead of the superposition of a contour plot and a grey-scale image as used in Figure 4.1.

> set.seed(371)

> cp <- expand.grid(seq(0, 1, l = 10), seq(0, 1, l = 10))

> s <- grf(grid = cp, cov.pars = c(2, 0.2), cov.model = "mat", + kappa = 1.5)

> image(s, col = gray(seq(1, 0.25, l = 21)))

> lambda <- exp(0.5 + s$data)

> y <- rpois(length(s$data), lambda = lambda)

> text(cp[, 1], cp[, 2], y, cex = 1.5, font = 2)

0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.81.0

X Coord

Y Coord

1 0 0 3 3 4 4 0 1 1 0 0 2 5 10 3 3 2 0 0 0 1 3 7 8 9 2 2 3 0 0 0 0 0 4 2 1 1 1 3 0 4 8 5 3 2 0 6 5 12 2 4 8 9 7 1 3 3 8 16 3 6 16 6 8 2 4 4 5 8 3 3 8 5 2 4 4 1 6 4 2 1 6 4 3 2 2 2 1 2 0 1 3 3 1 0 5 0 2 4

Figure 4.5. A simulation of the Poisson log-linear model. The numbers are the Poisson counts corresponding to locations at the centre of each grid square. The grey-scale represents the value of the underlying Gaussian process at each location.

The simulation model can be extended in various ways. For example, to include in the simulation non-spatial extra-Poisson variation of the kind dis-cussed at the end of Section 4.3.1, we simply replace the commandlambda <-exp(s$data) above by

> lambda <- exp(s$data + tau * rnorm(length(s$data)))

The additional term within the exponential generates independent Gaussian deviates with zero mean and varianceτ², which are added to the values of the underlying Gaussian process. Similarly, to include a spatially varying mean, we would add a regression term within the exponential.

Bernoulli model

Below, we give the code for the simulation shown in Figure 4.2. For better visu-alisation the underlying Gaussian process is simulated at 401 locations equally spaced in the unit interval and the logit transformation is applied at each lo-cation to obtain the corresponding conditional probabilities. The objectindis then used to select 51 equally spaced points, and the binary values at these selected locations are generated using therbinom()function.

> set.seed(34)

> locs <- seq(0, 1, l = 401)

> s <- grf(grid = cbind(locs, 1), cov.pars = c(5, 0.1), + cov.model = "matern", kappa = 1.5)

> p <- exp(s$data)/(1 + exp(s$data))

> ind <- seq(1, 401, by = 8)

4.6. Computation 95

> y <- rbinom(p[ind], size = 1, prob = p)

> plot(locs[ind], y, xlab = "locations", ylab = "data")

> lines(locs, p)

Binomial model

The 60 numbers shown in Figure 4.6 are simulated from a model with [Y(x)|S]∼ Bin{n, p(x)} with n = 5 and p(x) = exp{µ+S(x)}/[1 + exp{µ+S(x)}], where S(x) is a Gaussian process with mean µ = 2 and Mat´ern correlation function withκ= 1.5,φ= 0.15. The circles in Figure 4.6 are drawn with radii proportional to the corresponding values of the underlying Gaussian process.

To generate this simulation we first simulate from the Gaussian model, then logit-transform the simulated values to obtain the probabilities which we use to simulate the binomial data. A method for the functionpoints() plots the Gaussian values. Finally, we use the standard Rfunctiontext() to show the simulated binomial data as numbers above each sampling location. Our purpose in showing Figure 4.6 is not specifically to recommend this form of display, but more to illustrate different possibilities for visualisation of spatial data. The current example is one instance in which colour might be particularly effective, for example by using the radius of each circle to represent the corresponding realised value of the underlying Gaussian process and a discrete colour code for the actual count.

> set.seed(23)

> s <- grf(60, cov.pars = c(5, 0.25))

> p <- exp(2 + s$data)/(1 + exp(2 + s$data))

> y <- rbinom(length(p), size = 5, prob = p)

> points(s)

> text(s$coords, label = y, pos = 3, offset = 0.3)

In all of these examples, it is instructive to repeat the simulations with dif-ferent values of the model parameters so as to gain insight into how details of the model specification do or do not affect the appearance of the simulated realisations. Replicate simulations holding parameter values constant similarly give useful insights into the behaviour of the models.

4.6.2 Preferential sampling

Next we show how to simulate random, preferential and clustered samples as used in the example of Section 4.4.2. First, we simulate the signalS(x) in a grid of 10,000 points usinggrf(). Next we obtain measurementsYi corresponding to 50 points sampled at random usingsample.geodata(), which are returned as thegeodataobject yr. Note that there is no nugget term in this example, hence the sampled measurements are Yi =S(xi), where xi is the ith sampled location.

To simulate the preferential sample we make the probability that any point k from the grid is sampled proportional to exp{bSk}, whereb in the example below is 1.2 and S$data is the simulated value of the signal at the k^th

grid-0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.81.0

X Coord

Y Coord

5 4

5 1

5 0

5 3 1

5 5

3 4

4 5

4 2

4 4

4 5

4 5 3

5 5

5 5 3

5 5 5

5 3

Figure 4.6. Simulated binomial data. Circles are drawn at the data locations, with radii proportional to the corresponding values of the underlying Gaussian process.

Binomial counts are shown as numbers above the corresponding circles.

point. The sampled valuesYi =S(xi) are now returned as thegeodataobject yp.

Finally, to simulate a clustered sample we first generate a second, independent realisation of the signal process,S2(x) say, and make the probability of sampling pointk from the grid proportional to exp{bS2k} with sampled measurements inyc.

> set.seed(2391)

> S <- grf(10000, grid = "reg", cov.pars = c(1, 0.2))

> yr <- sample.geodata(S, size = 50)

> yp <- sample.geodata(S, size = 50, prob = exp(1.2 * S$data))

> S2 <- grf(10000, grid = "reg", cov.pars = c(1, 0.2))

> yc <- sample.geodata(S, size = 50, prob = exp(1.2 * S2$data))

No documento Model-based Geostatistics Peter J. Diggle and Paulo J. Ribeiro Jr. November 7, 2006 (páginas 104-107)