5 Ecological Regression and Ecological Inference ∗
5.3 A SIMPLE DISTANCE MINIMIZATION ALGORITHM FOR ECOLOGICAL INFERENCE: METHOD I
Ecological Regression and Ecological Inference 129 Unfortunately, even when there is a unique intersection point of the tomographic plot lines, that intersection need not be within the unit square, i.e., need not be a feasible value.6 Indeed, we might anticipate that, even in the absence of a unique intersection of the line segment bounds in the tomographic plot, when Goodman’s ecological regression method yields a feasible estimate of mean (Bb,Bw) values, it is likely that the results of Goodman’s approach and that of King’s approach to ecological inference will not be far apart. The differences between the two approaches appear likely to arise when Goodman’s ecological regression yields out-of-bounds estimates for one or more of the mean (or precinct spe- cific) parameters. We will return to this issue, i.e., the circumstances under which different methods are likely to give rise to different answers, later in the chapter.
5.3 A SIMPLE DISTANCE MINIMIZATION ALGORITHM FOR ECOLOGICAL
130 Bernard Grofman and Samuel Merrill be computed to the nearer endpoint of the segment. The specifications of these endpoints
P1andP2follow simple rules:7
IfTi ≤1−Xi then P1=
0, Ti
1−Xi
;
otherwise P1=
Ti−(1−Xi) Xi
, 1
. (5.6a)
IfTi ≥ Xi then P2=
1, Ti−Xi
1−Xi
;
otherwise P2= Ti
Xi
, 0
. (5.6b)
To implement this plan, it remains only to determine formulas for the points of intersection (to be used when they lie in the feasible region). As noted above, we have
βiw = −Xi
1−Xiβib+ Ti
1−Xi
(5.7)
as the equation for each precinct constraint line. If (Bb,Bw) lies on the aggregate constraint line, the line through this point and perpendicular to a precinct constraint line given by Equation 5.7 is given by
βiw =1−Xi
Xi βib+Bw−1−Xi
Xi Bb. (5.8)
The point of intersection of the precinct constraint line and this perpendicular is given by βib= XiTi−BwXi(1−Xi)+Bb(1−Xi)2
Xi2+(1−Xi)2 (5.9)
andβiwcan then be obtained from Equation 5.8.
In general, what we want to do is find the point on the district-level tomographic line that minimizes the sum of the squared distances from that point to all the line segments that define the precinct-specific joint bounds on theβibandβiwvalues. First note that, from Equation 5.8,
βiw−Bw =1−Xi
Xi
βib−Bb ,
7 Note that the conditions onTiin Equations 5.6a and 5.6b need not be complementary; it is the two conditions within 5.6a and within 5.6b that are complementary. In the degenerate case for which Xi=1, ifTi≤1− Xi thenP1=(0, 1); if Ti≥Xi thenP2=(1, 0).
Ecological Regression and Ecological Inference 131 so that the square of the distance from (Bb,Bw) to the precinct constraint line, i.e., to the point of intersection given by Equation 5.9, is
di2=
βib−Bb2
+
βiw−Bw2
=
βib−Bb2 X2i +(1−Xi)2
Xi2 . (5.10)
However, using Equation 5.9, we obtain
βib−Bb= XiTi−BwXi(1−Xi)−X2iBb Xi2+(1−Xi)2 . Together with Equation 5.10, this implies that
di2=
Ti−XiBb−(1−Xi)Bw2
Xi2+(1−Xi)2
=w2i
Ti−XiBb−(1−Xi)Bw2
, (5.11)
where the weightswiare given by
wi = 1
Xi2+(1−Xi)2 .
Note that the distance di can be interpreted as the weighted difference between the proportion of voters for the black candidate in theith precinct and what that proportion would be if the proportions voting for the black candidate broken down by race were given by BbandBw, that is, the same as in the district as a whole. Hence, it makes sense to seek values ofBbandBw that would minimize the squares of these differences. In fact, the numerator in Equation 5.11 is (Ti−Tˆi)2, where ˆTiis theith fitted value under Goodman regression.
If all points of intersection are in the feasible region, we simply minimize idi2 sub- ject to the constraint thatBbandBware feasible (lie on the district constraint line), i.e., that
X Bb+(1−X)Bw =T. (5.12)
Solving this constrained optimization problem by Lagrange multipliers, we obtain two linear equations inBbandBw:
Bb
i
w2iXi(Xi−X)+Bw
i
w2i(1−Xi)(Xi−X)=
i
w2iTi(Xi−X), BbX+Bw(1−X)=T,
132 Bernard Grofman and Samuel Merrill which yield the solutions
Bb = i
w2i(Xi−X) [(1−X)Ti−(1−Xi)T]
i
w2i(Xi−X)2 , (5.13a)
Bw = i
w2i(Xi−X) [XiT−X Ti]
i
wi2(Xi−X)2 . (5.13b)
Thus, in the special case in which all intersection points are in the feasible region, we have obtained closed-form solutions forBbandBw. These solutions are simple to compute on a spreadsheet and closely resemble the form of solutions to an ordinary least squares regression problem.8However, in solving our optimization problem, we are only interested in points of intersection (βib,βiw) that specifyfeasiblevalues for the respective precincts.
Accordingly, if the point of intersection is outside the feasible region, we modifydi2to be the squared distance to the nearer endpoint of the precinct line segment where it intersects the boundary of the feasible region. We then choose those values of Bband Bw that lie on the district tomographic line and that minimize idi2.
Standard errors and confidence intervals can be computed by a bootstrap method. This is done by repeated sampling with replacement from the data set, recomputing the param- eter estimates, and determining the standard deviation of these estimates (see Efron and Tibshirani, 1993).
Each precinct-level estimate is the pair (βbi,βwi ) that minimizes the expression (βib− Bb)2+(βiw−Bw)2. It is the intersection point of the perpendicular to the precinct tomo- graphic line if this value is feasible, and otherwise is the nearest endpoint of the precinct tomographic line segment to the district solution point (Bb,Bw). These computations can be implemented in an Excel spreadsheet and are available on the websiteshttp://www.
cbrss.harvard.edu/events/eic/book.htm and http://course.wilkes.
edu/Merrill/through Internet Explorer.
District parameter estimates for Method I are presented later for several artificial and real data sets in Tables 2–4; precinct-level estimates are given for one real data set in Table 3.
These results are discussed in Section 5.5.
If not all precincts are of equal size, we weight the di2 by the number Ni of voters in precincti, i.e., we minimize Nidi2. Equations 5.13a and 5.13b are replaced by
Bb =
i
wi2Ni(Xi−X) [(1−X)Ti−(1−Xi)T]
i
w2iNi(Xi−X)2 , (5.14a)
Bw = i
w2iNi(Xi−X) [XiT−X Ti]
i
w2iNi(Xi−X)2 . (5.14b)
8 In this special case, the solution would be identical to the ordinary least squares solution if the weightswi in Equation 5.13 were all identical.
Ecological Regression and Ecological Inference 133
5.4 EXTENDING THE DUNCAN–DAVIS METHOD OF BOUNDS TO DEVELOP TWO NEW FORMS