• Nenhum resultado encontrado

Frequentist analysis

No documento Principles of Statistical Inference (páginas 64-67)

Section 3.8. For more on Bayesian tests, see Jeffreys (1961) and also Section 6.2.6 and Notes 6.2

4.3 Frequentist analysis

forms what may be called the weak likelihood principle. Less immediately, if two different models but with parameters having the same interpretation, but possibly even referring to different observational systems, lead to datay andywith proportional likelihoods, then again the posterior distributions are identical. This forms the less compellingstrong likelihood principle. Most fre-quentist methods do not obey this latter principle, although the departure is usually relatively minor. If the models refer to different random systems, the implicit prior knowledge may, in any case, be different.

4.3 Frequentist analysis

4.3.1 Extended Fisherian reduction

One approach to simple problems is essentially that of Section2.5and can be summarized, as before, in the Fisherian reduction:

• find the likelihood function;

• reduce to a sufficient statisticSof the same dimension asθ;

• find a function ofSthat has a distribution depending only onψ;

• place it in pivotal form or alternatively use it to derivep-values for null hypotheses;

• invert to obtain limits forψat an arbitrary set of probability levels.

There is sometimes an extension of the method that works when the model is of the(k,d)curved exponential family form. Then the sufficient statistic is of dimensionkgreater thand, the dimension of the parameter space. We then proceed as follows:

• if possible, rewrite thek-dimensional sufficient statistic, whenk>d, in the form(S,A)such thatSis of dimensiondandAhas a distribution not depending onθ;

• consider the distribution ofSgivenA=aand proceed as before. The statisticAis calledancillary.

There are limitations to these methods. In particular a suitableAmay not exist, and then one is driven to asymptotic, i.e., approximate, arguments for problems of reasonable complexity and sometimes even for simple problems.

We give some examples, the first of which is not of exponential family form.

Example 4.1. Uniform distribution of known range.Suppose that(Y1,. . .,Yn) are independently and identically distributed in the uniform distribution over −1,θ+1). The likelihood takes the constant value 2nprovided the smallest

and largest values(y(1),y(n))lie within the range −1,θ+1)and is zero otherwise. The minimal sufficient statistic is of dimension 2, even though the parameter is only of dimension 1. The model is a special case of a location family and it follows from the invariance properties of such models thatA=Y(n)Y(1)

has a distribution independent ofθ.

This example shows the imperative of explicit or implicit conditioning on the observed value aofAin quite compelling form. Ifa is approximately 2, only values ofθvery close toy=(y(1)+y(n))/2 are consistent with the data.

If, on the other hand,ais very small, all values in the range of the common observed valueyplus and minus 1 are consistent with the data. In general, the conditional distribution ofYgivenA=ais found as follows.

The joint density of(Y(1),Y(n))is

n(n−1)(y(n)y(1))n2/2n (4.3) and the transformation to new variables(Y,A=Y(n)−Y(1))has unit Jacobian.

Therefore the new variables(Y,A)have densityn(n−1)a(n2)/2ndefined over the triangular region(0a≤2;θ−1+a/2≤ yθ+1−a/2)and density zero elsewhere. This implies that the conditional density ofYgivenA=ais uniform over the allowable intervalθ−1+a/2yθ+1−a/2.

Conditional confidence interval statements can now be constructed although they add little to the statement just made, in effect that every value ofθin the relevant interval is in some sense equally consistent with the data. The key point is that an interval statement assessed by itsunconditionaldistribution could be formed that would give the correct marginal frequency of coverage but that would hide the fact that for some samples very precise statements are possible whereas for others only low precision is achievable.

Example 4.2. Two measuring instruments.A closely related point is made by the following idealized example. Suppose that a single observation Y is made on a normally distributed random variable of unknown meanµ. There are available two measuring instruments, one with known small variance, say σ02=104, and one with known large variance, sayσ12=104. A randomizing device chooses an instrument with probability 1/2 for each possibility and the full data consist of the observationyand an identifierd =0, 1 to show which variance is involved. The log likelihood is

−logσd−exp{−(y−µ)2/(2σd2)}, (4.4) so that(y,d)forms the sufficient statistic anddis ancillary, suggesting again that the formation of confidence intervals or the evaluation of ap-value should use the variance belonging to the apparatus actually used. If the sensitive apparatus,

4.3 Frequentist analysis 49

d=0, is in fact used, why should the interpretation be affected by the possibility that one might have used some other instrument?

There is a distinction between this and the previous example in that in the former the conditioning arose out of the mathematical structure of the problem, whereas in the present example the ancillary statistic arises from a physical distinction between two measuring devices.

There is a further important point suggested by this example. The fact that the randomizing probability is assumed known does not seem material. The argu-ment for conditioning is equally compelling if the choice between the two sets of apparatus is made at random with some unknown probability, provided only that the value of the probability, and of course the outcome of the randomization, is unconnected withµ.

More formally, suppose that we have factorization in which the distribution ofSgivenA=adepends only onθ, whereas the distribution ofAdepends on an additional parameterγ such that the parameter space becomesθ ×γ, so thatθ andγ are variation-independent in the sense introduced in Section 1.1. ThenAis ancillary in the extended sense for inference aboutθ. The term S-ancillarity is often used for this idea.

Example 4.3. Linear model.The previous examples are in a sense a prelim-inary to the following widely occurring situation. Consider the linear model of Examples1.4and2.2in which then×1 vectorY has expectationE(Y)=zβ, wherezis ann×qmatrix of full rankq <nand where the components are independently normally distributed with varianceσ2. Suppose that, instead of the assumption of the previous discussion thatzis a matrix of fixed constants, we suppose thatz is the realized value of a random matrixZ with a known probability distribution.

The log likelihood is by (2.17)

nlogσ− {(y−zβ)ˆ T(yzβ)ˆ +ˆ−β)T(zTz)(βˆ−β)}/(2σ2) (4.5) plus in general functions ofzarising from the known distribution ofZ. Thus the minimal sufficient statistic includes the residual sum of squares and the least squares estimates as before but also functions ofz, in particularzTz. Thus conditioning onz, or especially onzTz, is indicated. This matrix, which specifies the precision of the least squares estimates, plays the role of the distinction between the two measuring instruments in the preceding example.

As noted in the previous discussion and using the extended definition of ancillarity, the argument for conditioning is unaltered if Z has a prob-ability distribution fZ(z;γ ), where γ and the parameter of interest are variation-independent.

In many experimental studies the explanatory variables would be chosen by the investigator systematically and treating z as a fixed matrix is totally appropriate. In observational studies in which study individuals are chosen in a random way all variables are random and so modellingzas a random variable might seem natural. The discussion of extended ancillarity shows when that is unnecessary and the standard practice of treating the explanatory variables as fixed is then justified. This does not mean that the distribution of the explanatory variables is totally uninformative about what is taken as the primary focus of concern, namely the dependence of Y onz. In addition to specifying via the matrix(zTz)1the precision of the least squares estimates, comparison of the distribution of the components ofzwith their distribution in some target population may provide evidence of the reliability of the sampling procedure used and of the security of any extrapolation of the conclusions. In a comparable Bayesian analysis the corresponding assumption would be the stronger one that the prior densities ofθandγ are independent.

No documento Principles of Statistical Inference (páginas 64-67)