• Nenhum resultado encontrado

A SIMPLE DISTANCE MINIMIZATION ALGORITHM FOR ECOLOGICAL INFERENCE: METHOD I

No documento Ecological Inference (páginas 139-143)

5 Ecological Regression and Ecological Inference ∗

5.3 A SIMPLE DISTANCE MINIMIZATION ALGORITHM FOR ECOLOGICAL INFERENCE: METHOD I

Ecological Regression and Ecological Inference 129 Unfortunately, even when there is a unique intersection point of the tomographic plot lines, that intersection need not be within the unit square, i.e., need not be a feasible value.6 Indeed, we might anticipate that, even in the absence of a unique intersection of the line segment bounds in the tomographic plot, when Goodman’s ecological regression method yields a feasible estimate of mean (Bb,Bw) values, it is likely that the results of Goodman’s approach and that of King’s approach to ecological inference will not be far apart. The differences between the two approaches appear likely to arise when Goodman’s ecological regression yields out-of-bounds estimates for one or more of the mean (or precinct spe- cific) parameters. We will return to this issue, i.e., the circumstances under which different methods are likely to give rise to different answers, later in the chapter.

5.3 A SIMPLE DISTANCE MINIMIZATION ALGORITHM FOR ECOLOGICAL

130 Bernard Grofman and Samuel Merrill be computed to the nearer endpoint of the segment. The specifications of these endpoints

P1andP2follow simple rules:7

IfTi ≤1−Xi then P1=

0, Ti

1−Xi

;

otherwise P1=

Ti−(1−Xi) Xi

, 1

. (5.6a)

IfTiXi then P2=

1, TiXi

1−Xi

;

otherwise P2= Ti

Xi

, 0

. (5.6b)

To implement this plan, it remains only to determine formulas for the points of intersection (to be used when they lie in the feasible region). As noted above, we have

βiw = −Xi

1−Xiβib+ Ti

1−Xi

(5.7)

as the equation for each precinct constraint line. If (Bb,Bw) lies on the aggregate constraint line, the line through this point and perpendicular to a precinct constraint line given by Equation 5.7 is given by

βiw =1−Xi

Xi βib+Bw−1−Xi

Xi Bb. (5.8)

The point of intersection of the precinct constraint line and this perpendicular is given by βib= XiTiBwXi(1−Xi)+Bb(1−Xi)2

Xi2+(1−Xi)2 (5.9)

andβiwcan then be obtained from Equation 5.8.

In general, what we want to do is find the point on the district-level tomographic line that minimizes the sum of the squared distances from that point to all the line segments that define the precinct-specific joint bounds on theβibandβiwvalues. First note that, from Equation 5.8,

βiwBw =1−Xi

Xi

βibBb ,

7 Note that the conditions onTiin Equations 5.6a and 5.6b need not be complementary; it is the two conditions within 5.6a and within 5.6b that are complementary. In the degenerate case for which Xi=1, ifTi1 Xi thenP1=(0, 1); if TiXi thenP2=(1, 0).

Ecological Regression and Ecological Inference 131 so that the square of the distance from (Bb,Bw) to the precinct constraint line, i.e., to the point of intersection given by Equation 5.9, is

di2=

βibBb2

+

βiwBw2

=

βibBb2 X2i +(1−Xi)2

Xi2 . (5.10)

However, using Equation 5.9, we obtain

βibBb= XiTiBwXi(1−Xi)−X2iBb Xi2+(1−Xi)2 . Together with Equation 5.10, this implies that

di2=

TiXiBb−(1−Xi)Bw2

Xi2+(1−Xi)2

=w2i

TiXiBb−(1−Xi)Bw2

, (5.11)

where the weightswiare given by

wi = 1

Xi2+(1−Xi)2 .

Note that the distance di can be interpreted as the weighted difference between the proportion of voters for the black candidate in theith precinct and what that proportion would be if the proportions voting for the black candidate broken down by race were given by BbandBw, that is, the same as in the district as a whole. Hence, it makes sense to seek values ofBbandBw that would minimize the squares of these differences. In fact, the numerator in Equation 5.11 is (TiTˆi)2, where ˆTiis theith fitted value under Goodman regression.

If all points of intersection are in the feasible region, we simply minimize idi2 sub- ject to the constraint thatBbandBware feasible (lie on the district constraint line), i.e., that

X Bb+(1−X)Bw =T. (5.12)

Solving this constrained optimization problem by Lagrange multipliers, we obtain two linear equations inBbandBw:

Bb

i

w2iXi(XiX)+Bw

i

w2i(1−Xi)(XiX)=

i

w2iTi(XiX), BbX+Bw(1−X)=T,

132 Bernard Grofman and Samuel Merrill which yield the solutions

Bb = i

w2i(XiX) [(1−X)Ti−(1−Xi)T]

i

w2i(XiX)2 , (5.13a)

Bw = i

w2i(XiX) [XiTX Ti]

i

wi2(XiX)2 . (5.13b)

Thus, in the special case in which all intersection points are in the feasible region, we have obtained closed-form solutions forBbandBw. These solutions are simple to compute on a spreadsheet and closely resemble the form of solutions to an ordinary least squares regression problem.8However, in solving our optimization problem, we are only interested in points of intersection (βib,βiw) that specifyfeasiblevalues for the respective precincts.

Accordingly, if the point of intersection is outside the feasible region, we modifydi2to be the squared distance to the nearer endpoint of the precinct line segment where it intersects the boundary of the feasible region. We then choose those values of Bband Bw that lie on the district tomographic line and that minimize idi2.

Standard errors and confidence intervals can be computed by a bootstrap method. This is done by repeated sampling with replacement from the data set, recomputing the param- eter estimates, and determining the standard deviation of these estimates (see Efron and Tibshirani, 1993).

Each precinct-level estimate is the pair (βbiwi ) that minimizes the expression (βibBb)2+(βiwBw)2. It is the intersection point of the perpendicular to the precinct tomo- graphic line if this value is feasible, and otherwise is the nearest endpoint of the precinct tomographic line segment to the district solution point (Bb,Bw). These computations can be implemented in an Excel spreadsheet and are available on the websiteshttp://www.

cbrss.harvard.edu/events/eic/book.htm and http://course.wilkes.

edu/Merrill/through Internet Explorer.

District parameter estimates for Method I are presented later for several artificial and real data sets in Tables 2–4; precinct-level estimates are given for one real data set in Table 3.

These results are discussed in Section 5.5.

If not all precincts are of equal size, we weight the di2 by the number Ni of voters in precincti, i.e., we minimize Nidi2. Equations 5.13a and 5.13b are replaced by

Bb =

i

wi2Ni(XiX) [(1−X)Ti−(1−Xi)T]

i

w2iNi(XiX)2 , (5.14a)

Bw = i

w2iNi(XiX) [XiTX Ti]

i

w2iNi(XiX)2 . (5.14b)

8 In this special case, the solution would be identical to the ordinary least squares solution if the weightswi in Equation 5.13 were all identical.

Ecological Regression and Ecological Inference 133

5.4 EXTENDING THE DUNCAN–DAVIS METHOD OF BOUNDS TO DEVELOP TWO NEW FORMS

No documento Ecological Inference (páginas 139-143)