PRIORS - 1 Prior and Likelihood Choices in the Analysis of Ecological Data

1 Prior and Likelihood Choices in the Analysis of Ecological Data

1.4 PRIORS

Prior and Likelihood Choices in the Analysis of Ecological Data 25 arise because the contextual effects are assumed linear inx. This model is not natural from a statistical perspective, however, since it allows estimated probabilities to lie outside of their permissible range of (0, 1). As discussed by Wakefield (2004), a general model that allows registration probabilities to depend on both the race of the individual and the contextual effects of race is given by

pj i=P(Y =1|racej, areai)=g⁻¹(αj i+βjxi), for a link functiong(·). This choice leads to

q_i=P(Y =1|areai)=g⁻¹(α0i+β0x_i)×x_i+g⁻¹(α1i+β1x_i)×(1−x_i). Ecological regression corresponds to this model with a linear link and withα0i =α0,α1i = α1, andβ0=β1=0; the nonlinear neighborhood model hasα0i =α1i =αiandβ0=β1= 0; the linear neighborhood model hasα0i =α1i =αandβ0=β1=βwith a linear link; and the quadratic model has a linear link withα0i=α0andα1i =α1. Hence it would appear that it would be profitable to consider the general form with a nonlinear link, logistic and probit forms being the obvious choices. In these cases the model with individual and contextual effects has all parameters identifiable. Unfortunately, assuming nonlinearity theoretically removes the nonidentifiability but in practice is totally dependent on the form chosen, and parameter estimates will in general be highly unstable. This was pointed out by Achen and Shively (1995: 117), who comment that since the contextual effects are not strong and the range of x is often not (0, 1), it would be virtually impossible to discriminate between nonlinear and linear forms (since any function that has a narrow range and does not change greatly may be well approximated with a linear form, via a Taylor series expansion). This is similar to criticisms of Heckman’s selection models (Heckman, 1979); see for example Little (1985) and Copas and Li (1997).

26 Jonathan Wakefield the example in Wakefield, 2004, for a further demonstration of this). The freely available EzI software (Benoit and King, 1998) may be used to implement the truncated normal model and its extensions.

At the second stage, King, Rosen, and Tanner (1999) assume that p0i andp1iare inde- pendent with

p_{j i}|a_j,b_j ∼Beta(a_j,b_j) (1.12) and with exponential priors Exp(λ) onaj,bj, j =0, 1, at the third stage, whereλ⁻¹is the mean of the exponential. Specifically, in the example considered it was assumed that these exponential priors had mean 2 (λ=0.5), a choice which is in general a poor one in that it does not favor large values of the hyperparameters and (sinceaj+bj−2 is acting like a prior sample size for race j) these are what is needed to add strong information to the sparse marginal data. This choice also produces a prior for each probability which is very strongly U-shaped (since beta priors witha_j <1,b_j <1 are themselves U-shaped, and these values of the hyperparameters are assigned considerable prior weight), which is not desirable in many instances. This is further commented upon in Section 1.7 and discussed more fully in Wakefield (2004); in particular see Figure 7. Choosing much smaller values of λ, for example,λ=0.01, produces almost uniform priors on the probabilities, and allows much larger values of the hyperparameters, though we would not universally recommend a particular hyperprior. As the number of tables decreases and thexdistribution becomes more asymmetric, this problem becomes more and more acute. The ideal situation is for substantive information to be available for prior specification. The strong dependence on the third stage prior is in stark contrast to the usual generalized mixed model case, for which there is far less dependence (except for priors on variance components, where again care must taken with small numbers of units).

The model given by Equation 1.12 does not allow dependence between the two random effects (note that this is distinct from the independence between pairs of random effects in different areas, which is also assumed), though it is conjugate (giving a marginal distribution for the data that is beta–binomial), which may offer some advantage in computation. The model also allows area-level covariates to be added at the second stage.

Wakefield (2004) proposed as an alternative to the beta model a second stage in which the logits of the registration probabilities arose from a bivariate normal distribution; this model was introduced, for the analysis of a series of 2×2 tables when the internal cells were observed by Skene and Wakefield (1990). Specifically, a reasonably general form is

θ0i =µ0+β0z_i+δ0i, θ1i = µ1+β1zi+δ1i

(1.13) with

δi ∼N2(0,), where

δi = δ0i

δ1i

and =

00 01

10 11

, (1.14)

and where θ0i andθ1i denote the logits of the probabilities p0i and p1i in table i, i.e.

p_{j i} =exp(θj i)/{1+exp(θj i)}, j =0, 1. In the specification 1.13, z_i represent area-level

Prior and Likelihood Choices in the Analysis of Ecological Data 27

Figure 1.5. Nesting of models. In the baseline model the parameters of f(·) are fixed, while in the hierarchical model the common parameters are estimated from the totality of the data.

characteristics (and may includexi), andβ0,β1are (ecological) log odds ratios associated with these variables. A third-stage hyperprior adds priors onµ0,µ1, and(andβ0,β1if there are covariates). In our limited experience it is difficult to gain information on the covariance term01 or on covariate relationships, without strong prior information. The difficulty of estimating contextual effects and the dependence of the area-specific probabilities is a further reinforcement of the lack of information in ecological data. Sensitivity analyses in which01and/orβ0,β1are fixed a priori are straightforward under this model, however.

For the case of no covariates and01=0, and without substantive information for the registration–race data, we may choose logistic priors with location 0 and scale 1 forµ0

andµ1, since these induce uniform priors on exp(µj)/{1+exp(µj)}(the median of the registration probability for race jacross the population of areas). For the precisions00⁻¹, ⁻¹₁₁ we specify gamma distributions Ga(a,b) (where the parameterization is such that the mean is given bya/b). In the application here we takea=1 andb=0.01 – these values were chosen via an informal examination of simulations from the prior that it induced for p0i, p1i, with different values ofa andb. In theWinBUGSmanual the priors Ga(0.001, 0.001) are often used for precisions within a hierarchical model. This choice is not to be recommended in general (that is, for all applications); here it is a very poor one (and leads to marginal priors for the probabilities that are highly U-shaped).

Figure 1.5 displays the nesting of a number of the models that we have described. The most simplistic model, at the top, is one in which there is a single registration probability for both races and for all areas. Taking the left fork gives the neighborhood models; taking

28 Jonathan Wakefield the right fork gives ecological regression. Hierarchical models allow 2mparameters but tie the pairs of probabilities together via the assumption of a common distribution from which they are drawn (possibly allowing contextual effects also). At the bottom of the nesting the baseline model is located. The latter is essentially a fixed effects model for each table retaining the 2mparameters – as we discussed above, we do not advocate the use of this model, but it is useful to identify the extreme saturated model for ecological data.

All of the above hierarchical models result in posterior distributions that are analyt- ically intractable (as we describe in the next section), but Markov chain Monte Carlo (MCMC) algorithms are relatively straightforward to implement (though convergence may be a problem), and all of the models but the truncated normal have been implemented in the WinBUGSsoftware (Spiegelhalter, Thomas, and Best, 1998). The Appendix gives code for the logistic normal with the normal approximation to the convolution at stage 1. In our fairly limited experience we have found that the logit model is much more stable than the beta model, at least when used within theWinBUGSsoftware. In particular we found that this software may crash with the beta model, because points very close to 0 and 1 are supported by ecological data and when sampled lead to numerical problems.

In the next section we briefly review the Bayesian approach to inference and give an overview of computation for the Bayesian models that have been described in the previous section. The Bayesian approach is particularly appealing in the context of ecological data because for such data modeling assumptions have to be made to enforce identifiability, and the most rigorous way of including such assumptions is via the adoption of a prior distribution.

No documento Ecological Inference (páginas 35-38)