COMPARING TWO PROPORTIONS - Categorical Data Analysis

COMPARING TWO PROPORTIONS 43

Prospecti®e studies usually condition on the totals n_isÝ_j n_{i j} for cate-gories of X and regard each row of J counts as an independent multinomial

sample on Y. Retrospecti®e studies usually treat the totals n_qj for Y as fixed and regard each column of I counts as a multinomial sample on X. In cross-sectional studies, the total sample size is fixed but not the row or column totals, and the IJ cell counts are a multinomial sample.

Case᎐control, cohort, and cross-sectional studies are called obser®ational studies. They simply observe who chooses each group and who has the outcome of interest. By contrast, a clinical trial is an experimental study, the investigator having the advantage of experimental control over which subjects receive each treatment. Such studies can use the power of randomization to make the groups balance roughly on other variables that may be associated with the response. Observational studies are common but have more poten-tial for biases of various types.

DESCRIBING CONTINGENCY TABLES

treatments on the proportion of subjects who die, the difference between 0.010 and 0.001 may be more noteworthy than the difference between 0.410 and 0.401, even though both are 0.009. In such cases, the ratio of proportions is also informative.

The relati®e risk is defined to be the ratio

␲₁r␲₂. Ž2.3. It can be any nonnegative real number. A relative risk of 1.0 corresponds to independence. For the proportions just given, the relative risks are 0.010r0.001s10.0 and 0.410r0.401s1.02. Comparing the rows on the

Ž . Ž .

second response category gives a different relative risk, 1y␲₁ r 1y␲₂ . 2.2.3 Odds Ratio

For a probability␲ of success, the oddsare defined to be

⍀s␲rŽ1y␲..

The odds are nonnegative, with ⍀)1.0 when a success is more likely than a failure. When␲s0.75, for instance, then ⍀s0.75r0.25s3.0; a success is three times as likely as a failure, and we expect about three successes for every one failure. When⍀s 1₃, a failure is three times as likely as a success.

Inversely,

␲s⍀rŽ⍀q1 .. For instance, when ⍀s1₃, then␲s0.25.

Refer again to a 2=2 table. Within row i, the odds of success instead of

Ž .

failure are ⍀_is␲_ir1y␲_i . The ratio of the odds ⍀₁ and ⍀₂ in the two rows,

⍀₁ ␲₁rŽ1y␲₁.

␪s s Ž2.4.

⍀₂ ␲₂rŽ1y␲₂. is called the odds ratio.

For joint distributions with cell probabilities ␲_{i j} , the equivalent definition for the odds in row i is ⍀_is␲_i1r␲_i2, is1, 2. Then the odds ratio is

␲11r␲12 ␲ ␲11 22

␪s s . Ž2.5.

␲21r␲22 ␲ ␲12 21

An alternative name for␪ is the cross-product ratio, since it equals the ratio of the products␲ ␲₁₁ ₂₂ and␲ ␲₁₂ ₂₁ of probabilities from diagonally opposite

Ž .

cells Yule 1900, 1912 .

COMPARING TWO PROPORTIONS 45 2.2.4 Properties of the Odds Ratio

The odds ratio can equal any nonnegative number. The condition ⍀1s⍀2

Ž .

and hence when all cell probabilities are positive ␪s1 corresponds to independence of X and Y. When 1-␪-⬁, subjects in row 1 are more likely to have a success than are subjects in row 2; that is, ␲₁)␲₂. For instance, when␪s4, the odds of success in row 1 are four times the odds in row 2. This does not mean that the probability ␲₁s4␲₂; that is the interpretation of a relati®e riskof 4.0. When 0-␪-1, ␲₁-␲₂. When one cell has zero probability, ␪ equals 0 or⬁.

Values of ␪ farther from 1.0 in a given direction represent stronger association. Two values represent the same association, but in opposite directions, when one is the inverse of the other. For instance, when ␪s0.25, the odds of success in row 1 are 0.25 times the odds in row 2, or equivalently, the odds of success in row 2 are 1r0.25s4.0 times the odds in row 1. When the order of the rows is reversed or the order of the columns is reversed, the new value for␪ is the inverse of the original value.

For inference, we shall see it is convenient to use log␪. Independence corresponds to log␪s0. The log odds ratio is symmetric about this valueᎏ reversal of rows or of columns results in a change in its sign. Two values for log␪ that are the same except for sign, such as log 4s1.39 and log 0.25s y1.39, represent the same strength of association.

The odds ratio does not change value when the orientation of the table reverses so that the rows become the columns and the columns become the

Ž .

rows. This is clear from the symmetric form of 2.5 . It is unnecessary to identify one classification as the response variable in order to use ␪. In fact,

Ž . Ž < .

although 2.4 defined it in terms of odds using ␲_isP Ys1 Xsi , one could just as well define it using reverse conditional probabilities. With a joint distribution, conditional distributions exist in each direction, and

< <

␲ ␲₁₁ ₂₂ P YŽ s1 Xs1.rP YŽ s2 Xs1.

␪s s

< <

␲ ␲₁₂ ₂₁ P YŽ s1 Xs2.rP YŽ s2 Xs2.

< <

P XŽ s1 Ys1.rP XŽ s2 Ys1.

s . Ž2.6.

< <

P XŽ s1 Ys2.rP XŽ s2 Ys2.

In fact, the odds ratio is equally valid for prospective, retrospective, or cross-sectional sampling designs. The sample odds ratio estimates the same parameter in each case.

For cell counts n_{i j}, the sample odds ratio is

␪ˆsn n₁₁ ₂₂rn n₁₂ ₂₁.

This does not change when both cell counts within any row are multiplied by a nonzero constant or when both cell counts within any column are multi-plied by a nonzero constant. An implication is that the sample odds ratio

DESCRIBING CONTINGENCY TABLES

Ž .

estimates the same characteristic ␪ even when the sample is disproportion-ately large or small from marginal categories of a variable. For a retrospec-tive study of the association between vaccination and catching a certain strain of flu, the sample odds ratio estimates the same characteristic with a random

Ž . Ž .

sample of 1 100 people who got the flu and 100 people who did not, or 2 40 people who got the flu and 160 people who did not. The sample versions

Ž .

of the difference of proportions and relative risk 2.3 are invariant to multiplication of counts within rows by a constant, but they change with multiplication within columns or with row᎐column interchange.

2.2.5 Aspirin and Heart Attacks Revisited

We illustrate the three association measures with Table 2.1 on aspirin use and heart attacks. The table differentiates between fatal and nonfatal heart attacks, but we combine these outcomes for now. Of the 11,034 physicians taking placebo, 189 suffered heart attacks, a proportion of 189r11,034s 0.0171. Of the 11,037 taking aspirin, 104 had heart attacks, a proportion of 0.0094. The sample difference of proportions is 0.0171y0.0094s0.0077.

The relative risk is 0.0171r0.0094s1.82. The proportion suffering heart attacks of those taking placebo was 1.82 times the proportion suffering heart

Ž .

attacks of those taking aspirin. The sample odds ratio is 189=10,933 r Ž10,845=104.s1.83. The odds of heart attack for those taking placebo was 1.83 times the odds for those taking aspirin.

2.2.6 Case–Control Studies and the Odds Ratio

With retrospective sampling designs, such as case᎐control studies, it is

Ž < .

possible to estimate conditional probabilities of form P Xsi Ysj . It is

Ž < .

usually not possible to estimate the probability P Ysj Xsi of an out-come of interest or the difference of proportions or relative risk for that

Ž .

outcome. It is possible to estimate the odds ratio, however, since by 2.6 it is determined by conditional probabilities in either direction.

To illustrate, we revisit Table 2.5 on Xssmoking behavior andYslung cancer. The data were two binomial samples on X at fixed levels of Y. Thus, we can estimate the probability a subject was a smoker, given the outcome on whether the subject had lung cancer; this was 688r709 for the cases and 650r709 for the controls. We cannot estimate the probability of lung cancer, given whether one smoked, which is more relevant. Thus, we cannot estimate differences or ratios of probabilities of lung cancer. The difference of proportions and relative risk are limited to comparisons of the probabilities of being a smoker. However, we can compute the odds ratio using the sample

Ž .

analog of 2.6 ,

688r709 r 21r709 688=59

Ž . Ž .

s s3.0.

650r709 r 59r709 650=21

Ž . Ž .

PARTIAL ASSOCIATION IN STRATIFIED2=2TABLES 47

Ž .

Moreover, by 2.6 , interpretations can use the direction of interest, even though the study was retrospective: The estimated odds of lung cancer for smokers were 3.0 times the estimated odds for nonsmokers.

2.2.7 Relationship between Odds Ratio and Relative Risk

Ž . Ž .

From definitions 2.3 and 2.4 ,

1y␲2

odds ratiosrelative risk

ž

¹^y␲1

/

Their magnitudes are similar whenever the probability␲_i of the outcome of interest is close to zero for both groups. We saw this similarity in Section 2.2.5 for the aspirin study, where the heart attack proportion was less than 0.02 for each group. The relative risk was 1.82 and the odds ratio was 1.83.

Because of this similarity, when each␲_i is small, the odds ratio provides a rough indication of the relative risk when it is not directly estimable, such as

Ž .

in case᎐control studies Cornfield 1951 . For instance, for Table 2.5, if the probability of lung cancer is small regardless of smoking behavior, 3.0 is also a rough estimate of the relative risk; that is, smokers had about 3.0 times the relative frequency of lung cancer as nonsmokers.

No documento Categorical Data Analysis (páginas 58-62)