C H A P T E R 3
Inference for Contingency Tables
In this chapter we introduce inferential methods for contingency tables.
Many of these methods also play a vital role in analyses of later chapters for which categorical data need not have contingency table form. The methods assume Poisson, multinomial, or independent binomial sampling.
In Section 3.1 we present confidence intervals for measures of association for 2=2 tables such as the odds ratio. Section 3.2 covers chi-squared tests of the hypothesis of independence between two categorical variables. Like any significance test, these have limited usefulness. In Section 3.3 we show how to follow-up the test using residuals or the partitioning property of chi-squared to extract components that describe the evidence about the association. In Section 3.4 we present more powerful inference applicable with ordered categories. The methods of Sections 3.1 through 3.4 assume large samples. In Sections 3.5 and 3.6 we introduce small-sample methods.
3.1 CONFIDENCE INTERVALS FOR ASSOCIATION PARAMETERS
CONFIDENCE INTERVALS FOR ASSOCIATION PARAMETERS 71 showed that the amended estimators
n q0.5 n q0.5
Ž 11 . Ž 22 .
˜s
n q0.5 n q0.5
Ž 12 . Ž 21 .
˜ Ž .
and log behave well Problem 14.4 .
ˆ ˜
The estimators and have the same asymptotic normal distribution around . Unless n is quite large, however, their distributions are highly
ˆ Ž
skewed. When s1, for instance, cannot be much smaller than since ˆ .
G0 , but it could be much larger with nonnegligible probability. The log transform, having an additive rather than multiplicative structure, converges more rapidly to normality. An estimated standard error for logˆis
1r2
1 1 1 1
ˆ
Ž
logˆ.
sž
n11 q n12 q n21 q n22/
. Ž3.1. We derive this formula in Section 3.1.7.By the large-sample normality of logˆ,
ˆ ˆ
log"z␣r2ˆ
Ž
log.
Ž3.2.Ž .
is a Wald confidence interval for log. Exponentiating taking antilogs of its
Ž .
endpoints provides a confidence interval for . Woolf 1955 proposed this interval. It works quite well, usually being a bit conservative i.e., actualŽ coverage probability higher than the nominal level ..
ˆ ˆ
Whens0 or⬁, Woolf’s interval does not exist. Whens0, one should take 0 as the lower limit and when ˆs⬁, one should take ⬁ as the upper limit. The other bound can use the Woolf formula following some
adjust-Ž . 4 4
ment, such as Gart’s 1966 , which replaces ni j by ni jq0.5 in the estimator and standard error. A less ad hoc approach forms the interval by
Ž .
inverting score tests Cornfield 1956 or likelihood-ratio tests for , as we discuss in Section 3.1.8.
3.1.2 Aspirin and Myocardial Infarction Example
We illustrate inference for the odds ratio with Table 3.1 based on a Swedish study of the association between aspirin use and myocardial infarction similar to that described in Section 2.2.5. The study randomly assigned 1360 patients who had already suffered a stroke to an aspirin treatment one low-doseŽ tablet a day or to a placebo treatment. Table 3.1 reports the number of. deaths due to myocardial infarction during a follow-up period of about 3 years.
ˆ ˜
The sample odds ratios1.56 is close tos 1.55, since no cell count is
ˆ ˆ
Ž . Ž .
especially small. The standard error 3.1 of logs0.445 is ˆ log s0.307.
INFERENCE FOR CONTINGENCY TABLES
72
TABLE 3.1 Swedish Study on Aspirin Use and Myocardial Infarction
Myocardial Infarction
Yes No Total
Placebo 28 656 684
Aspirin 18 658 676
Source:Based on results described in Lancet338: 1345᎐1349 Ž1991 ..
A 95% confidence interval for log in the population this sample represents
Ž . Ž .
is 0.445"1.96 0.307 , or y0.157, 1.047 . The corresponding interval for is wexpŽy0.157 , exp 1.047 , or 0.85, 2.85 . The estimate of the true odds ratio. Ž .x Ž . is rather imprecise.
Since the confidence interval for contains 1.0, it is plausible that the true odds of death due to myocardial infarction are equal for aspirin and placebo.
If there truly is a beneficial effect of aspirin but the odds ratio is not large, it may require a large sample size to show that benefit because of the relatively
Ž .
small number of myocardial infarction cases Problem 3.21 . 3.1.3 Interval Estimation of Difference of Proportions
The difference of proportions and the relative risk compare conditional distributions of a response variable for two groups. For these measures, we treat the samples as independent binomials. For group i, yi has a binomial distribution with sample size niand a probabilityi of a ‘‘success’’ response.
The sample proportion ˆisyirni has expectation i and variance
Ž .
i 1yi rni. Since ˆ1 andˆ2 are independent, their difference has E
Ž
ˆ1yˆ2.
s1y2and standard error
1Ž1y1. 2Ž1y2. 1r2
Ž
ˆ1yˆ2.
s q . Ž3.3.n1 n2
Ž . Ž .
The estimate ˆ ˆ1yˆ2 uses formula 3.3 withi replaced byˆi. Then
ˆ yˆ "z ˆ ˆ yˆ Ž3.4.
Ž
1 2.
␣r2Ž
1 2.
Ž .
is a Wald confidence interval for1y2. Like the Wald interval 1.13 for a single proportion, it usually has true coverage probability less than the nominal confidence coefficient, especially when 1 and 2 are near 0 or 1.
More complex but better methods are cited in Section 3.1.8, Note 3.2, and Problem 3.23.
CONFIDENCE INTERVALS FOR ASSOCIATION PARAMETERS 73 3.1.4 Interval Estimation of Relative Risk
The sample relative risk is rsˆ1rˆ2. Like the odds ratio, it converges to normality faster on the log scale. The asymptotic standard error of logr is
1r2
1y1 1y2
Žlogr.s
ž
1n1 q 2n2/
. Ž3.5.Ž .
The Wald interval exponentiates endpoints of logr"z␣r2ˆ logr . It works well but can be somewhat conservative. We discuss an alternative method in Section 3.1.8.
For Table 3.1, the sample proportion of myocardial infarction deaths was 0.0409 for subjects taking placebo and 0.0266 for subjects taking aspirin. The sample relative risk is 0.0409r0.0266s1.54. The 95% confidence interval for
Ž . Ž . Ž .
the log relative risk of log 1.54 "1.96 0.297 translates to 0.86, 2.75 for the relative risk. We infer that the death rate for those taking placebo was between 0.86 and 2.75 times that for those taking aspirin. The Wald 95%
Ž . Ž .
confidence interval for 1y2 is 0.014"1.96 0.0098 or y0.005, 0.033 . According to either measure, substantial public health benefits could result from taking aspirin, but no effect or a slight negative effect are also plausible.
Results for the larger study described in Section 2.2.5 do show a benefit.
3.1.5 Deriving Standard Errors with the Delta Method*
A simple and useful method exists of deriving standard errors for large-sam-ple inferences. Let Tn denote a statistic that is asymptotically normally distributed about a parameter , the subscript n expressing its dependence
Ž .
on sample size. Suppose that an estimator is a function g Tn of Tn. Then, Ž .
under mild conditions, g Tn itself has a large-sample normal distribution.
The standard error depends on how fast g tŽ .changes for t near.
Specifically, for large n, suppose that Tn is normally distributed about
' '
Ž . with standard error r n. That is, as n™⬁, the cdf of n Tny converges to the cdf of a normal random variable with mean 0 and variance
2. This limiting behavior is an example of con®ergence in distribution, denoted by
d 2
'
n TŽ ny.™NŽ0, ..Let g be a function that is at least twice differentiable at. Using the Taylor series expansion for g tŽ . in a neighborhood of ts, in Section 14.1.2 we show
'
n g TŽ n.ygŽ. f'
n TŽ ny.gXŽ.INFERENCE FOR CONTINGENCY TABLES
74
FIGURE 3.1 Depiction of delta method.
XŽ .
for large n, where g s⭸gr⭸t evaluated at ts. Recall if a variate
Ž 2. Ž 2 2.
Y;N 0, , then cY;N0,c . Thus,
d X 2 2
'
n g TŽ n.ygŽ. ™NŽ
0, g Ž. .
. Ž3.6.Ž . Ž .
In other words, g Tn is approximately normal around g with variance wgXŽ .x2 2rn.
Figure 3.1 portrays this result. Locally around , g tŽ . is approximately
XŽ . Ž .
linear, with slope g . Then g Tn is approximately normal, since linear transformations of normal random variables are themselves normal. The
Ž . Ž . < XŽ .<
dispersion of g Tn values about g is about g times the dispersion of Tn values about . If the slope of g at is , then12 g maps a region of Tn
Ž .
values into a region of g Tn values only about half as wide.
Ž . XŽ . 2 2Ž .
Result 3.6 is called the delta method. Since g and s usually depend on the unknown parameter , the asymptotic variance is unknown. Confidence intervals and tests substitute Tn for and use the
'
w Ž . Ž .x < XŽ .< Ž .result that n g Tn yg r g Tn Tn is asymptotically standard nor-mal. For instance,
X
'
g TŽ n."1.96 g TŽ n. ŽTn.r n Ž . is a large-sample Wald 95% confidence interval for g . 3.1.6 Delta Method Applied to Sample Logit*
We illustrate the delta method for a function of the ML estimator Tnsˆs Ž . yrnof the binomial parameter, for ysuccesses in ntrials. Since E Y sn
Ž . Ž . Ž . Ž . Ž .
and var Y sn 1y , E ˆ s and var ˆ s 1y rn. Also, ˆ
CONFIDENCE INTERVALS FOR ASSOCIATION PARAMETERS 75 has a large-sample normal distribution by the central limit theorem. So do many functions ofˆ.
The log odds function ofˆ,
gŽˆ.slog ˆrŽ1yˆ. ,
Ž .
is called the sample logit. Evaluated at, its derivative equals 1r 1y . By the delta method, the asymptotic variance of the sample logit is
Ž . Ž . w Ž .x
1y rn the variance of ˆ multiplied by the square of 1r 1y . That is
ˆ d 1
'
nž
log 1yˆ ylog 1y/
™Nž
0, Ž1y./
.The asymptotic normality of ˆ propagates to asymptotic normality of w Ž .x
logˆr1yˆ .
The asymptotic variance is the variance of the normal distribution that approximates the true distribution, for largen. It is not an approximation for the variance of the true distribution. For 0--1, the asymptotic variance wnŽ1y.xy1 of the sample logit is finite. By contrast, the true variance does not exist: Sinceˆs0 or 1 with positive probability, the logit can equal y⬁ or ⬁ with positive probability. The probability of an infinite logit converges to zero rapidly as n increases. For large n, the distribution of the
w Ž .x
sample logit looks essentially normal with mean logr1y and standard w Ž .xy1r2
deviation n 1y . Thus, for the logit, the asymptotic variance actually has greater use than the true variance. Incidentally, related to this, the bootstrap is not helpful for approximating standard errors for many discrete measures, because it mimics the true rather than the more relevant asymptotic standard error.
3.1.7 Delta Method for Log Odds Ratio*
Standard errors for the log odds ratio and the log relative risk result from a
4
multiparameter version of the delta method. Suppose that ni, is1, . . . ,c
Ž 4.
have a multinomial n, i distribution. The sample proportion ˆisnirn has mean and variance
E
Ž
ˆi.
si and varŽ
ˆi.
siŽ1yi.rn. Ž3.7. In Section 14.1.4 we show that for i/j,ˆi andˆj have covariancecov
Ž
ˆ ˆi,j.
s y i jrn. Ž3.8.Ž .
The sample proportions ˆ ˆ1,2, . . . , ˆcy1 have a large-sample multivariate normal distribution. For functions of them, the delta method implies the
INFERENCE FOR CONTINGENCY TABLES
76
following result, proved in Section 14.1.4:
Ž . 4 Ž .
Let g denote a differentiable function of i, with sample value g ˆ for a multinomial sample. Let
⭸gŽ.
is , is1, . . . ,c.
⭸i
'
w Ž . Ž .xThen as n™⬁, the distribution of n g ˆ yg r converges to standard normal, where
2 2 2
s
Ý
i i yŽ Ý
i i.
. Ž3.9.4
The asymptotic variance depends on i and the partial derivatives of the
4 4 4 Ž .
measure with respect to i . In practice, replacing i and i in 3.9 by
2 2
'
their sample values yields an ML estimate ˆ of . Then ˆr n is an Ž .
estimated standard error for g ˆ . A large-sample Wald confidence interval Ž .
for g is
gŽˆ."z␣r2ˆr
'
n.Ž .
With the substitution of ˆ for in 3.9 , the limiting distribution is still standard normal, but convergence is slower. The equivalence in the large-sample distribution is justified as follows: The large-sample proportions converge
4
in probability to i, by the weak law of large numbers. Since ˆ is a continuous function of the sample proportions, it converges in probability to
, and rˆ converges in probability to 1. Now
gŽˆ.ygŽ. gŽˆ.ygŽ.
'
n s'
n .ˆ ˆ
The first term on the right-hand side converges in distribution to standard
Ž .
normal, by 3.9 , and the second term converges in probability to 1. Thus, their product also has a limiting standard normal distribution.
Ž .
We now apply the delta method to the log odds ratio, taking g slog slog11qlog22ylog12ylog21. Since
11s⭸Žlog.r⭸11s1r11
12s y1r12, 21s y1r21, 22s1r22,
2 2 Ž .
Ý Ýi j i j i js0 and sÝ Ýi j i j i jsÝ Ýi j 1ri j . The asymptotic
stan-ˆ 4
dard error of log for a multinomial sample ni j is
1r2
ˆ
'
Ž
log.
sr n sž Ý Ý
i j 1rni j/
.Ž .
Since nˆi jsni j, the estimated standard error is 3.1 .
CONFIDENCE INTERVALS FOR ASSOCIATION PARAMETERS 77 ˆ
Ž .
The delta method also applies directly with to obtain ˆ and a Wald
ˆ Žˆ. ˆ
confidence interval "z␣r2 ˆ . This is not recommended; converges more slowly than logˆ to normality, this interval could contain negative values, and it does not give results equivalent to those obtained with the Wald interval using 1rˆand its standard error.
3.1.8 Score and Profile Likelihood Confidence Intervals*
Standard errors obtained with the delta method appear in Wald confidence intervals. However, intervals based on inverting Wald tests sometimes work poorly for small to moderate n. Alternative intervals result from inverting likelihood-ratio or score tests. Although computationally more complex, these methods often perform better.
We illustrate first with the score method for the difference of proportions.
Ž .
The score test Mee 1984; Miettinen and Nurminen 1985 ofH0:1y2s⌬ has the test statistic
ˆ yˆ y⌬
Ž
1 2.
zŽ⌬.s
⌬ 1y ⌬ rn q ⌬ 1y ⌬ rn
'
ˆ1Ž . ˆ1Ž . 1 ˆ2Ž . ˆ2Ž . 2Ž .
where ⌬ˆi denotes the ML estimate of i subject to the constraint
Ž . Ž .
1y2s⌬. That is, ⌬ˆ1 and ⌬ˆ2 are the values of 1 and 2
satisfying 1y2s⌬ that maximize the product of the two binomial probability mass functions. These values do not have closed-form expressions and are determined using numerical methods. The score confidence interval
< Ž .<
is the set of ⌬ such that z ⌬ -z␣r2. Computations for such intervals
Ž .
require iteration Nurminen 1986 .
For the relative risk also, slightly better performance results with an interval using the score method ŽBedrick 1987; Gart and Nam 1988;
Koopman 1984, Miettinen and Nurminen 1985; Nurminen 1986 . Cornfield. Ž1956 and Miettinen and Nurminen 1985 showed the score interval for the. Ž . odds ratio. We prefer not to use a continuity or finite-sampling correction with these intervals, as then performance is too conservative. The fact that the score intervals are computationally more complex than Wald intervals should not be an impediment to their use in this modern era of computing, as the principle behind them is simple. However, currently they are not avail-able in standard software.
For a confidence interval based on the likelihood-ratio test, we illustrate with the odds ratio. The multinomial likelihood for a 2=2 table is a function
4 4
of 11,12,21 . Equivalently, it can be expressed in terms of ,1q,q1
Žrecall Section 2.4.1 . Thus, in inverting a likelihood-ratio test of. H0: s0
to check whether 0 belongs in the confidence interval, there are two
Ž . Ž .
nuisance parameters. Their null ML estimates ˆ1q 0 and ˆq1 0 that maximize the likelihood under the null vary as 0 does.
INFERENCE FOR CONTINGENCY TABLES
78
Ž Ž . Ž ..
The profile log-likelihood function is L0,ˆ1q 0 ,ˆq1 0 , viewed as a function of0. For each 0 this function gives the maximum of the ordinary log likelihood subject to the constraint s0. Evaluated at 0sˆ, this is
ˆ
Ž .
the maximized log likelihood L ,ˆ1q,ˆq1 , which occurs at the sample proportions ˆ1qsn1qrn and ˆq1snq1rn. The profile likelihood confi-dence interval for is the set of 0 for which
ˆ 2
y2 L
Ž
0,ˆ1qŽ0.,ˆq1Ž0..
yLž
,ˆ1q,ˆq1/
- ␣1Ž . .This contains all 0 not rejected in likelihood-ratio tests of nominal size ␣. The profile likelihood approach is available with some software e.g., forŽ SAS, see Table A.2 in Appendix A . A related approach, discussed in Section. 6.7.1, uses a conditional likelihood function that eliminates the nuisance parameters by conditioning on their sufficient statistics. This is beneficial when there are many nuisance parameters. An advantage of score and likelihood-based intervals is that unlike the Wald, they are not adversely affected when the sample relative risk or odds ratio is 0 or⬁.
In this section we have discussed interval estimation. Significance tests normally refer to a null hypothesis value of 0.0 for the log odds ratio, log relative risk, and difference of proportions. These are special cases of independence applied to 2=2 tables. In the next section we present tests of independence for two-way contingency tables.