• Nenhum resultado encontrado

the weights can be obtained using the function krweights(), which takes as arguments the data coordinates, the location(s) of the point(s) to be predicted and the object which specifies the model. For example, to obtain the weights shown in the lower-left panel of Figure 6.3 we use the commands below.

> coords <- cbind(c(0.2, 0.25, 0.6, 0.7), c(0.1, 0.8, 0.9,

+ 0.3))

> KC <- krige.control(ty = "ok", cov.model = "mat", kap = 1.5, + nug = 0.1, cov.pars = c(1, 0.1))

> krweights(coords, c(0.5, 0.5), KC)

[1] 0.1935404 0.2301559 0.2125838 0.3637199

6.8. Exercises 155 For each selected realisation, take as the data a random sample of sizen from the 1600 grid-point values ofS(x).

(a) Obtain the predictive distribution of the proportion of the study area for whichS(x)>0, i.e. A(0) in the notation of Section 6.5.1, using plug-in predictions with:

(i) true parameter values

(ii) parameter values estimated by maximum likelihood.

Compare the two predictive distributions obtained under (i) and (ii).

(b) Investigate how the predictive distributions change as you increase the sample size,n.

(c) Comment generally.

Printer: Opaque this

7

Bayesian inference

In Chapters 5 and 6 we discussed geostatistical inference from a classical or non-Bayesian perspective, treating parameter estimation and prediction as separate problems. We did this for two reasons, one philosophical the other practical.

Firstly, in the non-Bayesian setting, there is a fundamental distinction between aparameterand aprediction target. A parameter has a fixed, but unknown value which represents a property of the processes which generate the data, whereas a prediction target is the realised value of a random variable associated with those same processes. Secondly, estimation and prediction are usually opera-tionally separate in geostatistical practice, meaning that we first formulate our model and estimate its parameters, then plug the estimated parameter values into theoretical prediction equations as if they were the true values. An obvi-ous concern with this two-phase approach is that ignoring uncertainty in the parameter estimates may lead to optimistic assessments of predictive accuracy.

It is possible to address this concern in various ways without being Bayesian, but in our view the Bayesian approach gives a more elegant solution, and it is the one which we have adopted in our own work.

7.1 The Bayesian paradigm: a unified treatment of estimation and prediction

7.1.1 Prediction using plug-in estimates

In general, a geostatistical model is specified through two models: a sub-model for an unobserved spatial process{S(x) :x∈IR2}, called the signal, and a sub-model for the data Y = (Y1, . . . , Yn) conditional on S(·). Using θ as a

7.1. The Bayesian paradigm: a unified treatment of estimation and prediction 157 generic notation for all unknown parameters, a formal notation for the model specification is

[Y, S|θ] = [S|θ][Y|S, θ], (7.1) whereS denotes the whole of the signal process, {S(x) :x∈IR2}. The square bracket notation, [·], means “the distribution of” the random variable or vari-ables enclosed in the brackets, with a vertical bar as usual denoting conditioning.

Whilst we find this notation helpful in emphasising the structure of a model, it will sometimes be more convenient to use the notationp(·) to denote probabil-ity or probabilprobabil-ity densprobabil-ity, in which case we reserveπ(·) to denote the Bayesian prior distribution of model parameters.

The classical predictive distribution of S is the conditional distribution [S|Y, θ], which in principle is obtainable from the model specification by an application of Bayes’ Theorem. For any target for prediction,T, which is a de-terministic functional ofSthe predictive distribution forT follows immediately in principle from that ofS, although it may or may not be analytically tractable.

In either event, to generate a realisation from the predictive distribution [T|Y, θ]

we need only generate a realisation from the predictive distribution [S|Y, θ] and apply a deterministic calculation to convert fromS to T.

A plug-in predictive distribution consists simply of treating estimated pa-rameter values as if they were the truth; hence, for any target T the plug-in predictive distribution is [T|Y,θ].ˆ

In the special case of the linear Gaussian model as defined in (5.12) and with a prediction targetT =S(x) the plug-in predictive distribution is known explicitly. As demonstrated in Section 6.2.1, [T|Y, θ] is Gaussian with mean

Tˆ= E[T|Y, θ] =µ(x) +r0V(θ)−1(Y −µ) and variance

Var[T|Y, θ] =σ2(1−r0V(θ)−1r),

whereµ(x) =d(x)0β is the n-element vector with elementsµ(xi) :i= 1, ..., n, σ2V(θ) = Var(Y|θ) as given by (6.6) andris then-element vector of correlations with elementsri = Corr{S(x), Yi}.

These formulae assume that S(x) has zero mean i.e., any non-zero mean is included in the specification of the regression model for µ(x). When the target depends on bothS and the trend, µ(x), for example when we want to predictµ(x) +S(x) at an arbitrary location, we simply plug the appropriate point estimate ˆµ(x) into the definition ofT. Plug-in prediction often results in optimistic estimates of precision. Bayesian prediction remedies this.

7.1.2 Bayesian prediction

The Bayesian approach to prediction makes no formal distinction between the unobserved signal processS and the model parametersθ. Both are unobserved random variables. Hence, the starting point is a hierarchically specified joint distribution for three random entities: the data,Y; the signal,S; and the model

parameters,θ. The specification extends the two-level hierarchical form (7.1) to a three-level hierarchy,

[Y, S, θ] = [θ][S|θ][Y|S, θ], (7.2) where now [θ] is the prior distribution for θ. In theory, the prior distribution should reflect the scientist’s prior opinions about the likely values ofθ prior to collection and inspection of the data; in practice, as we discuss below, the prior is often chosen pragmatically.

The Bayesian predictive distribution for S is defined as the conditional dis-tribution [S|Y]. This is again obtained from the model specification by an application of Bayes’ Theorem, but starting from (7.2) rather than (7.1). This leads to the result

[S|Y] = Z

[S|Y, θ][θ|Y]dθ, (7.3)

showing that the Bayesian predictive distribution is a weighted average of plug-in predictive distributions, plug-in which the weights reflect our posterior uncertaplug-inty about the values of the model parameters θ. As with plug-in prediction, the predictive distribution for any targetT which is a functional of S follows im-mediately, as the transformation fromS to T is deterministic. In practice, we simulate samples from the predictive distribution ofS, and from each such sim-ulated sample we calculate a corresponding sampled value from the predictive distribution ofT.

Typically, but not universally, the Bayesian paradigm leads to more conser-vative predictions in the sense that the resulting predictive distribution [T|Y] is more dispersed than the plug-in predictive distribution [T|Y,θ]. Note alsoˆ that as the data become more abundant, then for any parameter θ which is identifiable from the data we expect the posterior distribution [θ|Y] to become progressively more concentrated around a single value ˆθ. In other words, the Bayesian predictive distribution forS, and therefore for any targetT, converges to the plug-in. However, the rate of convergence is problem specific, depending on a complex inter-play involving the prior, the model and the sampling design.

In our experience the difference between the two can be substantial, especially for non-linear targetsT. Also, we re-emphasise our point of view that the com-plete solution to a predictive problem is a probability distribution, not a single value. In geostatistical applications where prediction is the scientific goal, point estimates of parameters may be acceptable, but point predictions are of limited value.

In the special case of the linear Gaussian model with targetT = S(x) and pragmatic prior assumptions, we can obtain explicit results for the Bayesian predictive distribution of T. As in Section 5.3, we first illustrate the general approach for the unrealistic case in which all model parameters other than the mean and variance are assumed known, then relax these assumptions to derive a prediction algorithm for the case of practical interest, in which all parameters are assumed unknown and are assigned a joint prior distribution.