Section 4.3. The discussion of the uniform distribution follows Welch (1939), who, however, drew and maintained the conclusion that the unconditional
5.16 Sequential stopping
of independence among the sets of data, and of course that is not essential, then the information that the chosen set has the largest effect is irrelevant, the posterior distribution is unchanged, i.e., no direct allowance for selection is required.
A resolution of the apparent conflict with the frequentist discussion is, how-ever, obtained if it is reasonable to argue that such a strategy of analysis is most likely to be used, if at all, when most of the individual null hypotheses are essentially correct. That is, withmhypotheses under examination the prior probability of any one being false may be approximatelyν0/m, whereν0may be treated as constant asmvaries. Indeedν0might be approximately 1, so that the prior expectation is that one of the null hypotheses is false. The dependence onmis thereby restored.
An important issue here is that to the extent that the statistical analysis is concerned with the relation between data and a hypothesis about that data, it might seem that the relation should be unaffected by how the hypothesis came to be considered. Indeed a different investigator who had focused on the particular hypothesisH from the start would be entitled to usep. But if simple significance tests are to be used as an aid to interpretation and discov-ery in somewhat exploratory situations, it is clear that some such precaution as the use of (5.20) is essential to ensure relevance to the analysis as imple-mented and to avoid the occurrence of systematically wrong answers. In fact, more broadly, ingenious investigators often have little difficulty in producing convincing after-the-event explanations of surprising conclusions that were unanticipated beforehand but which retrospectively may even have high prior probability; see Section5.10. Such ingenuity is certainly important but explan-ations produced by that route have, in the short term at least, different status from those put forward beforehand.
5.16 Sequential stopping 89
on the observed sample size in the Bayesian formulation if the prior densities ofγ and ofθ are independent, or in the frequentist version if the parameters are variation-independent. This covers many applications and corresponds to current practice.
The situation is more difficult if the random variables are observed in sequence in such a way that, at least in a very formalized setting, when the firstmobservations, denoted collectively byy(m), are available, the probability that the (m+1)th observation is obtained ispm+1(y(m))and that otherwise no more observations are obtained andN =m. In a formalized stopping rule the ps are typically 0 or 1, in that a decision about stopping is based solely on the current data. That is, it is assumed that in any discussion about whether to stop no additional information bearing on the parameter is used. The likelihood is essentially unchanged by the inclusion of such a purely data-dependent factor, so that, in particular, any Fisherian reduction of the data to sufficient statistics is unaffected after the inclusion of the realized sample size in the statistic; the values of intermediate data are then irrelevant.
In a Bayesian analysis, provided temporal coherency holds, i.e., that the prior does not change during the investigation, the stopping rule is irrelevant for analysis and the posterior density is computed from the likelihood achieved as ifnwere fixed. In a frequentist approach, however, this is not usually the case. In the simplest situation with a scalar parameterθand a fixed-sample-size one-dimensional sufficient statistics, the sufficient reduction is now only to the (2, 1) family(s,n)and it is not in general clear how to proceed.
In some circumstances it can be shown, or more often plausibly assumed as an approximation, thatnis an ancillary statistic. That is to say, knowledge ofn on its own would give little or no information about the parameter of interest.
Then it will be reasonable to condition on the value of sample size and to consider the conditional distribution ofSfor fixed sample size, i.e., to ignore the particular procedure used to determine the sample size.
Example 5.11. Precision-based choice of sample size.Suppose that the mean of a normal distribution, or more generally a contrast of several means, is being estimated and that at each stage of the analysis the standard error of the mean or contrast is found by the usual procedure. Suppose that the decision to stop collecting data is based on these indicators of precision, not of the estimates of primary concern, the mean or contrast of means. Then treating sample size as fixed defines a reasonable calibration of thep-value and confidence limits.
The most extreme departure from treating sample size as fixed arises when the statistics, typically a sum or mean value is fixed. Then it is the inverse distribution, i.e., ofNgivenS=s, that is relevant. It can be shown in at least
some cases the difference from the analysis treating sample size as fixed is small.
Example 5.12. Sampling the Poisson process. In any inference about the Poisson process of rateρthe sufficient statistic is(N,S), the number of points observed and the total time of observation. IfSis fixed,N has a Poisson dis-tribution of meanρs, whereas ifNis fixedShas the gamma density of mean n/ρ, namely
ρ(ρs)(n−1)e−ρs/(n−1)!. (5.21) To test the null hypothesisρ =ρ0 looking for one-sided departures in the directionρ > ρ0 there are twop-values corresponding to the two modes of sampling, namely
r∞=ne−ρ0s(ρ0s)r/r! (5.22)
and s
0
{e−ρ0x(ρ0x)n−1ρ0/(n−1)!}dx. (5.23) Repeated integration by parts shows that the integral is equal to (5.22) as is clear from a direct probabilistic argument.
Very similar results hold for the binomial and normal distributions, the latter involving the inverse Gaussian distribution.
The primary situations where there is an appreciable effect of the stopping rule on the analysis are those where the emphasis is strongly on the testing of a particular null hypothesis, with attention focused strongly or even exclusively on the resulting p-value. This is seen most strongly by the procedure that stops when and only when a preassigned level of significance is reached in formal testing of a given null hypothesis. There is an extensive literature on the sequential analysis of such situations.
It is here that one of the strongest contrasts arises between Bayesian and frequentist formulations. In the Bayesian approach, provided that the prior and model remain fixed throughout the investigation, the final inference, in partic-ular the posterior probability of the null hypothesis, can depend on the data only through the likelihood function at the terminal stage. In the frequentist approach, the final interpretation involves the data directly not only via that same likelihood function but also on the stopping criterion used, and so in par-ticular on the information that stopping did not take place earlier. It should involve the specific earlier data only if issues of model adequacy are involved;
for example it might be suspected that the effect of an explanatory variable had been changing in time.