• Nenhum resultado encontrado

Impersonal degree of belief

No documento Principles of Statistical Inference (páginas 90-93)

Section 4.3. The discussion of the uniform distribution follows Welch (1939), who, however, drew and maintained the conclusion that the unconditional

5.8 Impersonal degree of belief

5.8 Impersonal degree of belief

It is explicit in the treatment of the previous section that the probabilities con-cerned belong to a particular person, You, and there is no suggestion that even given the same informationF different people will have the same probability.

Any agreement between individuals is presumed to come from the availability of a large body of data with an agreed probability model, when the contribu-tion of the prior will often be minor, as indeed we have seen in a number of examples.

A conceptually quite different approach is to defineP(E |F)as the degree of belief inEthat a reasonable person would hold givenF. The presumption then is that differences between reasonable individuals are to be considered as arising from their differing information bases. The termobjective degree of beliefmay be used for such a notion.

The probability scale can be calibrated against a standard set of frequency-based chances. Arguments can again be produced as to why this form of probability obeys the usual rules of probability theory.

To be useful in individual applications, specific values have to be assigned to the probabilities and in many applications this is done by using a flat prior which is intended to represent an initial state of ignorance, leaving the final analysis to be essentially a summary of information provided by the data. Example1.1, the normal mean, provides an instance where a very dispersed prior in the form of a normal distribution with very large variancevprovides in the limit a Bayesian solution identical with the confidence interval form.

There are, however, some difficulties with this.

• Even for a scalar parameterθthe flat prior is not invariant under reparameterization. Thus ifθis uniformeθ has an improper exponential distribution, which is far from flat.

• In a specific instance it may be hard to justify a distribution putting much more weight outside any finite interval than it does inside as a

representation of ignorance or indifference.

• In multidimensional parameters the difficulties of specifying a suitable prior are much more acute.

For a scalar parameter the first point can be addressed by finding a form of the parameter closest to being a location parameter. One way of doing this will be discussed in Example6.3.

The difficulty with multiparameter priors can be seen from the following example.

Example 5.5. The noncentral chi-squared distribution. Let (Y1,. . .,Yn)be independently normally distributed with unit variance and meansµ1,. . .,µn

referring to independent situations and therefore with independent priors, assumed flat. Eachµ2khas posterior expectationy2k+1. Then if interest focuses on 2 = µ2k, it has posterior expectation D2 +n. In fact its posterior distribution is noncentral chi-squared withndegrees of freedom and noncent-ralityD2 = Yk2. This implies that, for largen,2is with high probability D2+n+Op(n). But this is absurd in that whatever the true value of2, the statisticD2is with high probability2+n+Op(

n). A very flat prior in one dimension gives good results from almost all viewpoints, whereas a very flat prior and independence in many dimensions do not. This is called Stein’s paradox or more accurately one of Stein’s paradoxes.

If it were agreed that only the statisticD2and the parameter2are relevant the problem could be collapsed into a one-dimensional one. Such a reduction is, in general, not available in multiparameter problems and even in this one a general Bayesian solution is not of this reduced form.

Quite generally a prior that gives results that are reasonable from various viewpoints for a single parameter will have unappealing features if applied independently to many parameters. The following example could be phrased more generally, for example for exponential family distributions, but is given now for binomial probabilities.

Example 5.6. A set of binomial probabilities.Letπ1,. . .,πnbe separate bino-mial probabilities of success, referring, for example, to properties of distinct parts of some random system. For example, success and failure may refer respectively to the functioning or failure of a component. Suppose that to estim-ate each probabilitymindependent trials are made withrksuccesses, trials for different events being independent. If eachπk is assumed to have a uniform prior on(0, 1), then the posterior distribution ofπkis the beta distribution

πkrk(1πk)mrk

B(rk+1,mrk+1), (5.7)

where the beta function in the denominator is a normalizing constant. It follows that the posterior mean ofπk is(rk+1)/(m+2). Now suppose that interest lies in some function of theπk, such as in the reliability contextψn=πk.

Because of the assumed independencies, the posterior distribution ofψnis derived from a product of beta-distributed random variables and hence is, for largem, close to a log normal distribution. Further, the mean of the posterior distribution is, by independence,

(rk+1)/(m+2)n (5.8)

5.8 Impersonal degree of belief 75

and asn→ ∞this, normalized byψn, is 1+1/(πkm)

1+2/m . (5.9)

Now especially ifn is large compared withmthis ratio is, in general, very different from 1. Indeed if all theπkare small the ratio is greater than 1 and if all theπk are near 1 the ratio is less than 1. This is clear on general grounds in that the probabilities encountered are systematically discrepant from the implications of the prior distribution.

This use of prior distributions to insert information additional to and distinct from that supplied by the data has to be sharply contrasted with an empirical Bayes approach in which the prior density is chosen to match the data and hence in effect to smooth the empirical distributions encountered. For this a simple approach is to assume a conjugate prior, in this case a beta density proportional toπλ11(1π)λ21and having two unknown parameters. The marginal likelihood of the data, i.e., that obtained by integrating outπ, can thus be obtained and theλs estimated by frequentist methods, such as those of Chapter6. If errors in estimating theλs are ignored the application of Bayes’

theorem to find the posterior distribution of any function of theπk, such asψ, raises no special problems. To make this into a fully Bayesian solution, it is necessary only to adopt a prior distribution on theλs; its form is unlikely to be critical.

The difficulties with flat and supposedly innocuous priors are most striking when the number of component parameters is large but are not confined to this situation.

Example 5.7. Exponential regression.Suppose that the exponential regression of Example1.5is rewritten in the form

E(Yk)=α+βρzk, (5.10)

i.e., by writingρ =eγ, and suppose that it is known that 0< ρ <1. Suppose further thatα,β,ρare given independent uniform prior densities, the last over (0, 1)and that the unknown standard deviationσ has a prior proportional to 1/σ; thus three of the parameters have improper priors to be regarded as limiting forms.

Suppose further thatn independent observations are taken at valueszk = z0+ak, wherez0,a>0. Then it can be shown that the posterior density ofρ tends to concentrate near 0 or 1, corresponding in effect to a model in which E(Y)is constant.

No documento Principles of Statistical Inference (páginas 90-93)