A New Extension of the Binomial Error Model for Responses to Items of Varying Difficulty in Educational Testing and Attitude Surveys.

(1)

A New Extension of the Binomial Error Model

for Responses to Items of Varying Difficulty

in Educational Testing and Attitude Surveys

James A. Wiley1, John Levi Martin2*, Stephen J. Herschkorn3, Jason Bond4

1Department of Family and Community Medicine and Institute for Health Policy Studies, School of Medicine, University of California San Francisco, San Francisco, California, United States of America,2Sociology Department, University of Chicago, Chicago, Illinois, United States of America,3Department of Mathematics, College of Staten Island, New York City, New York, United States of America,4Alcohol Research Group, Emeryville, California, United States of America

*[email protected]

Abstract

We put forward a new item response model which is an extension of the binomial error model first introduced by Keats and Lord. Like the binomial error model, the basic latent var-iable can be interpreted as a probability of responding in a certain way to an arbitrarily speci-fied item. For a set of dichotomous items, this model gives predictions that are similar to other single parameter IRT models (such as the Rasch model) but has certain advantages in more complex cases. The first is that in specifying a flexible two-parameter Beta distribu-tion for the latent variable, it is easy to formulate models for randomized experiments in which there is no reason to believe that either the latent variable or its distribution vary over randomly composed experimental groups. Second, the elementary response function is such that extensions to more complex cases (e.g., polychotomous responses, unfolding scales) are straightforward. Third, the probability metric of the latent trait allows tractable extensions to cover a wide variety of stochastic response processes.

Introduction

In this paper we introduce a class of item response models for the analysis of response distribu-tions derived from survey data. The simplest item response function in this class is a generaliza-tion of the binomial error model for ability testing of Keats and Lord [1]. Similar item response functions were considered briefly by Lazarsfeld [2] and Coleman [3] but not implemented as models for response data. The distinguishing characteristic of the approach taken here is that the probability of an item response is written as a function of a latent variable that can also be interpreted as a probability. The choice of a probability metric suggests the Beta density as a natural choice to model the distribution of the latent variable. Furthermore, a response func-tion formulated in this way is easy to modify to accommodate variafunc-tions in the nature and complexity of response tasks.

OPEN ACCESS

Citation:Wiley JA, Martin JL, Herschkorn SJ, Bond J (2015) A New Extension of the Binomial Error Model for Responses to Items of Varying Difficulty in Educational Testing and Attitude Surveys. PLoS ONE 10(11): e0141981. doi:10.1371/journal.pone.0141981

Editor:Fabio Rapallo, University of East Piedmont, ITALY

Received:June 8, 2015

Accepted:October 15, 2015

Published:November 6, 2015

Copyright:© 2015 Wiley et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability Statement:All relevant data are within the paper.

Funding:The authors have no support or funding to report.

(2)

We first introduce the extended binomial error (EBE) model as a generalization of the bino-mial error model. We discuss issues of identifiability, estimation and fit, and give two examples, one for a single sample, and another for data from two independent samples, noting the simi-larities in fit between the extended binomial error and loglinear Rasch models. Finally, we discuss extensions for polychotomous data and for modeling responses produced by non-cumulative response functions.

The resulting class of models have special utility for the investigation of substantively important questions about the nature of response (as opposed to scoring long tests), Further, in contrast to the loglinear Rasch models that have been most influential in sociology due to the work of Duncan [4] in particular, this approach allows us to make strong assertions about the socialdistributionof the latent trait.

The Extended Binomial Error Model

Introducing item difficulty into the binomial error model for dichotomous

responses

The binomial error model for test scores was introduced by Keats and Lord [1] [5] as a strong true score theory for dichotomous test items of more or less equal difficulty. This model is based on the assumption that the conditional distribution of the observed score given the true score (conceived as a theoretical“proportion correct”measure) is binomial, with the true score playing the role of the constant probability of a correct answer inMindependent trials where Mis the number of items. With no additional specifications, Keats and Lord were able to show that the firstMmoments of the true score distribution can be determined from the moments of the observed score distribution. Furthermore, they showed [6] that a linear regression of true on observed score implies and is implied by a negative hypergeometric distribution of the observed scores. They also proved that a two-parameter Beta distribution for the true scores implies a linear regression of true score on observed score and a negative hypergeometric dis-tribution for the observed scores.

Of course, it is infrequent that we analyze a set of dichotomous items with the same levels of difficulty (see also [7]); hence Keats and Lord indirectly incorporated variations in item diffi-culties in their compound binomial model, which accordingly lost many of the most attractive features of the binomial error model. Some other modifications of the binomial error assump-tions have been proposed [8], but these are not often implemented as IRT models.

We propose to modify the binomial error model by direct incorporation of item-difficulty parameters and by assuming a bounded proportion-like latent trait. The model assumptions allow a compact expression for the probability of any observed response pattern in terms of item-difficulty parameters and the parameters of the distribution of the latent trait, assumed to be two-parameter Beta. This expression leads in turn to an algorithm for simultaneous ML esti-mation of both sets of parameters. In contrast to a Rasch model, therefore, our estiesti-mation of item parameters is not separable from the estimation of person parameters; the advantage of specifying a flexible family of distributions like the Beta comes in the capacity to easily analyze factorial experiments in which either the trait, or item hardnesses, or both may be affected by treatments, as we show below.

A Generalization

Letx_{= [}_x₁_,_x₂_,..,_x_M_{] be a response vector for a set of}_M_{dichotomous items, coded so that}_x_i_{= 1}

(3)

0 coding of responses is arbitrary but useful in writing down the probabilities associated with complete response patterns.) Consider a latent variabley, where 0<y<1, and let the difficulty

of the ithitem beki(ki>0 for all i). We propose the simple response function

Pr½xi¼1jy ¼y

k_i _ð₁_Þ

The conditional probability of any set ofMresponses under conditional independence is

Pr½xjy ¼Y

n

i¼1

ðyki_Þxi_ð₁ _yki_Þð1 xiÞ _ð₂_Þ

and the unconditional probability is

Pr½x ¼

ð

Pr½xjyφðyÞdy ð3Þ

whereϕ(y) is the density ofy.

There are a number of advantages to constructing the model using the form ofEq 1, as the latent trait may be interpreted as the probability of answering standardized (ki= 1) item in a

positive direction. However, a complication arises for the interpretation of standard errors of thekparameters given that they must be strictly non-negative. As is the case for other models with necessarily non-negative parameters (such as models for variances), the standard errors cannot be interpreted symmetrically; indeed, in cases of very poor fit, standard errors may imply that the true population value might quite plausibly be negative. When the model fits well, these issues are minor, but it makes sense to carry out statistical tests on the equivalence ofkparameters not by comparison of standard errors, but by the comparison of chi-squares of nested models constraining or not constraining parameters to be equal. If the use of confidence intervals is required, one can rewrite the item response function as a double-exponential whose argument is the difference between an unbounded latent variableξand an unbounded item dif-ficulty parameterχi, whereξ= -log[-log(y)] andχi= log(ki),ξ, -1<ξ,χi<+1. Thus we can

write

Pr½x_i¼ 1_j_x_¼ _{exp½ ðexpð ð}_x _w

iÞÞ: ð4Þ

We note, however, that in contrast to a Rasch model,Eq 4does not imply equiprobability whenξ=χi; instead, Pr[xi= 1|ξ] = .5 when (ξ-χi) = -.3665. Given thatEq 1has the more

intui-tive relation to a probability statement, we prefer this for the development of a family of models for stochastic response.

The choice of a probability metric suggests the Beta density as a natural choice to model the distribution of the latent variable, as it, like a probability, is defined for the interval [0,1]. Thus we propose:

φðyÞ ¼ GðaþbÞ G_ðaÞG_ðbÞy

ða 1Þ_ð₁ _yÞðb 1Þ

ð5Þ

wherea,b>0, 0<y<1, andΓis the complete gamma function. (We note that such a speciﬁ ca-tion was suggested but not implemented by Lazarsfeld [2].) Note that E[y] =a/(a+b); V[y] = ab/[(a+b)2(a+b+1)]. Forki=kfor all i, this leads to the binomial error model with a Beta

den-sity for the latent trait, sometimes called the beta-binomial model [7]. It can be shown that given this density, for anyk>0,

E½yk_¼GðaþbÞGðaþkÞ

(4)

Thus, for example, the unconditional probability of the unit response vectorx_{= [1,1,1,}. . ._{,1] is}

simply

Pr½1_;1_;. . ._;1_¼_E½_Pr_½1_;1_;. . ._;1_j_{y ¼}_{E y}

XM

i¼1

ki 2

6 6 4

3

7 7 5

¼

G_ða_þ_bÞG_ða_þX

M

i¼1

kiÞ

G_ðaÞG_ða_þ_b_þX

M

i¼1

kiÞ

ð7Þ

More generally, any Pr[x_{] may be written as a function of gamma functions whose arguments}

are the sums of the distribution parametersaandband/or item difﬁcultieski.

Representation of the Unconditional Response Probabilities

Let Pr[x_{] represent the unconditional probability of a response. For any}x_{, let}_I₍x_{) = {i:}_x_i_{= 0}}

andC(x_{) = {i:}_x_i_{= 1}. Let}_A_{be any subset of}_I₍x_{), including the null set Ø and let |}_A_{| be the}

number of elements inA. Given these definitions, we show in Appendix A that

Pr½x ¼Pr½x1;x2;. . ._;_x

M ¼ X

AIðxÞ

ð 1_ÞjAj_E Y

i2A[CðxÞ

Pr½xi¼1jy

" #

ð8Þ

with

E Y

i2A[CðxÞ

Pr½x_i¼1_jy

" #

¼

G_ða_þ_bÞG_ða_þ X

i2A[CðxÞ kiÞ

G_ðaÞG_ða_þ_b_þ X

i2A[CðxÞ kiÞ

ð9Þ

for the EBE model. These equations are useful for constructing joint item response functions in programming routines for ML estimation.

Initial estimates for the model parameters may be obtained from bivariate cross-classifica-tions. Given the resulting table from any such cross-classification of the ithand jthitems, and fixing one of the two relevant item parameters (sayki) = 1, the other three parameterskj,aand

bcan be estimated from the three degrees of freedom in the 2×2 table. Repeating thisM-1 times for all cross classifications of the ithitem with all other items produces a complete vector of starting values. (We give the details inAppendix B.) We then use the Polak and Ribière [9] version of the Fletcher and Reeves conjugate gradient method [10] to find maximum likelihood estimates; we also use the Nelder and Mead [11] simplex methods to check that there are no preferable solutions in the area of the start values. In no case did we find that our maximum was local only.

Scores and Score Variances from the Posterior Distribution of

y

given x

The posterior density ofygivenx_{is defined as}

h½yjx ¼ Pr½xjyφðyÞ

ð

1

0

Pr½xjyφðyÞdy

¼Pr½xjyφðyÞ

(5)

Accordingly, the expectation of the trait score associated with a manifest response vector is just

E½yjx ¼

ð

1

0

h½yjxydy¼

ð

1

0

Pr½xjyyφðyÞdy

Pr½x : ð11Þ

We shall regard E[y|x_{] as an estimate of the person score corresponding to the response vector}x_.

The variance of the posterior distribution serves as a measure of the precision of E[y|x_{] as a}

repre-sentation of the person score and can be calculated from the relation V[y|x_{] = E[}_y2_|x_]-[E[_y_|x_]]2_.

As discussed in Appendix A, there is a form for the score of anyx_{that is similar to Eqs}₈_and₉_.

Local Identifiability

Eq 8is locally identifiable if the Jacobian matrix of the transformation from model parameters to response probabilities has full column rank for a given vector of parameter values [12]. Our numerical explorations suggest that this model, without the imposition of any additional con-straints, is locally identifiable for plausible regions of the parameter space. Nevertheless, the imposition of normalizations of the formki= 1 for some i, providing an item-specific metric of

the latent variable, produces more stable full-information maximum likelihood estimates than the unconstrained model. This is the result of the following interesting circumstance: the power of a Beta-distributed random variable is a random variable that is nearly—but not quite —distributed as Beta variable. The choice of i to fix is arbitrary in the following senses: a) the fit of the model to data is virtually independent of this choice and b) ML estimates of item parameters for all normalizations are related in a simple way. This last point deserves a brief elaboration.

Suppose for some set of data the model is normalized with respect to the hthitem and that ML estimates are written askih, andah,bhwithkih= 1 if i = h. Letkihandah,bhbe the ML

esti-mates for normalization with respect to item h so thatkih= 1 if i = h. To a close approximation,

kij=kih/kjh. We also note that the special case in which thebparameter is seta priorito 1 is

underidentified and requires an arbitrary constraint on one item parameter or, equivalently, on theadistribution parameter. With this constraint, there are simple closed form solutions for the ratios of item parameters to the single distributional parameter. For this case,Eq 6 simpli-fies to

Pr½x_i ¼1_¼_E½yki_¼ a aþki

ð12Þ

and hence

ki

a ¼

½1 _Pr_½x

i¼1

Pr½x_i¼1 : ð13Þ

Thus the ratio of any item hardness to the single distribution parameter can be recreated sim-ply as a ratio of failures to successes at the marginals.

Illustrative Analyses of Dichotomous Responses

Issues of Fit

(6)

shall introduce below. But for comparison of non-nested models, we use the model selection criterion of Raftery’s BIC [14], which is equal toL2—df_ln(_N_{), where}_L2_{is the likelihood ratio}

chi-square, df represents the degrees of freedom, andNthe number of persons in the sample. (The saturated model has a BIC of 0, any model with a negative BIC is preferred to the satu-rated model, and the model with the lowest BIC is preferred.) As recently emphasized by Weakliem [15], BIC is not without its drawbacks, and more rigorous implementations of Bayesian logic are now tractable [16]. However, the chief practical drawback of BIC seems to be its overly conservative nature, as it prefers more parsimonious models, and given the temp-tation to over-fit data sets, we think that BIC serves well as a general criterion for model com-parison in which many models are fit to the same data set.

The Single Group Case

We begin by re-analyzing the classic Army data presented by Stouffer [17] and analyzed using a Rasch model by Duncan [4] and Kelderman [18]. The data are to be found inTable 1. We present the model that results when we constraink4= 1; the likelihood chi-square is 17.42,

with 10 degrees of freedom (p = .066). Raftery’s BIC for this model is–51.66. By contrast, the Rasch model implemented by Duncan had a likelihood ratio chi-square of 10.93 at 8 degrees of freedom, (p = .206) leading to a BIC of–44.33. In sum, the loglinear Rasch model has a closer fit but uses up more degrees of freedom; while deviations from the extended binomial error are marginally significant according classical criteria, the more parsimonious model is preferred according to the Bayesian criterion.

Most importantly, the two models agree closely as to the positions of the different items. We find a correlation>.99 between the natural log of the extended binomial error model_’s item

parameters and those of the Rasch model. (Here we use results from a conditional maximum likelihood fit. Duncan’s item parameters are quite different, which we assume to be a mistake, since the order of parameter hardnesses differs not only from our Rasch analyses, but from the marginal distribution of the items. Kelderman did not report item parameters.) Inspection of the estimated scores (Table 2) from the extended binomial error (Eq 11) model shows that

Table 1. Stouffer’s Army Data.

ItemA ItemB ItemC ItemD Frequency

0 0 0 0 229

0 0 0 1 199

0 0 1 0 52

0 0 1 1 96

0 1 0 0 25

0 1 0 1 60

0 1 1 0 16

0 1 1 1 69

1 0 0 0 16

1 0 0 1 45

1 0 1 0 8

1 0 1 1 55

1 1 0 0 10

1 1 0 1 42

1 1 1 0 3

1 1 1 1 75

(7)

while there are clear score differences between patterns with the same Rasch raw score (the number of“positive”responses), the ranking of respondents with respect to posterior scores is roughly the same as the ranking with respect to raw scores. The differences between posterior expected value scores over response patterns are small compared to the posterior standard deviations for each pattern; this is to be expected given that the number of items is small.

In this case and most others we have investigated the log linear version of the Rasch and extended binomial error models lead to similar conclusions, though latter tends to be more parsimonious and the former to fit somewhat better, as it hasM-2 more free parameters. Nev-ertheless, the two need not agree. Numerical investigations demonstrate the existence of cases in which response distributions generated by the extended binomial error model parameters fit the Rasch model well (as judged by goodness-of-fit measures) but produced estimates of log-linear trait distribution parameters that violate the moment inequalities conditions of Cressie and Holland [19].

In sum, the EBE behaves similarly to the Rasch model; given an implementation involving greater distributional flexibility (the loglinear version) the Rasch model will naturally fit some-what better at the cost of more parameters. There are, however, data that are fit by one model and not by the other. Most importantly, the probability-like metric of the latent trait allows for extensions that may be of great interest when we wish to examine the mechanisms of response processes (as opposed to scoring long tests). We go on to examine several such extensions, starting with models for independent groups under experimental conditions.

The Multiple Group Case

The multiple group extension of the extended binomial error model permits investigation of issues related to item bias and, under certain conditions, study of group differences in the dis-tribution of the latent trait. Let g = 1,. . ._,_G_{index groups which may be formed by partitioning a}

single random sample, sampling from diverse populations, or by random assignment of differ-ent item formats. To allow for differences in item and distribution parameters over these groups, we may rewrite Eqs4and5as follows:

Pr½x_ig ¼1_jy_;_k

ig ¼y

kig _ð₁₄_Þ

φ_gðyÞ ¼ GðagþbgÞ G_ða

gÞGðbgÞ

yðag 1Þ_ð1 _yÞðbg 1Þ _ð₁₅_Þ

To illustrate, we take data from a national telephone survey of Italian adults 18–69 years old that was conducted in April and May, 1994 [20]. The survey was part of a study of regional and ethnic prejudice in Italy and included a series of items dealing with stereotypical beliefs about Table 2. Results of Fit to Army Data.

Parameter Value Standard Error

k1 4.0879 .3124

k2 3.4005 .2534

k3 2.5674 .1875

k4 1.0000*

---a 3.0223 .3350

b 1.7173 .1623

*=ﬁxed; Log-likelihood chi-square = 17.422; df = 10; p = .066.

(8)

Africans and East Europeans living in Italy. Each respondent was assigned at random to a set of questions targeted at one of three immigrant groups: a) North Africans, such as Moroccans, Tunisians, or Algerians (probability = .25); b) Africans from regions of Central Africa, such as Senegal and Somalia (probability = .25); and c) Eastern Europeans, such as Poles, Albanians, or Slavs (probability = .5). For each target group, the respondent was presented with a statement incorporating a positive or negative adjective pertaining to the group, e.g.:“do you agree that most of them are complainers? (they try to make others feel sorry for them)”. The interviewer then asked respondents“Do you agree strongly, agree somewhat, disagree somewhat, or dis-agree strongly with this description?”

For our example, we selected responses to items incorporating four negative adjectives which roughly translate into“selfish”,“slackers”,“violent”, and“complainers”, combined the North and Central African target groups into a single target group (based on similarity of the marginal distributions), and dichotomized the responses into“Agree”(coded 1) and“ Dis-agree”(coded 0). The response distribution forN= 2001 respondents in the 1994 survey is shown inTable 3.

As an exercise, we can use these data to determine whether the responses are consistent with the hypothesis of an underlying trait of prejudice or hostility to minorities, and if so, if Africans and Eastern Europeans are perceived identically. If there are differences in the percep-tion of Africans and Eastern Europeans, we can determine whether there is still a single trait of “prejudice”with perhaps different thresholds for attributing negative characteristics to Afri-cans and Eastern Europeans, or whether there seem to be different latent traits involved when it comes to judging members of the two target groups.

Table 4presents fit statistics for selected models for these data. The first model corresponds to Eqs14and15, only adding the constraintk3g= 1 for g = 1,2 (the reason the third item is

cho-sen will become clear below). This model fits quite well, generating a likelihood ratio chi-square of 15.93 with 20 degrees of freedom (p = .721). Models 2 and 3 impose substantively important constraints. Model 2 setskig=kifor all i; it is a test of identical item meanings (semantic

invari-ance) that allows the choice of target group to evoke different traits (e.g., degree of hostility to Africans or degree of hostility to Eastern Europeans). This sort of model might be used to examine item bias across two non-experimental groups; if this model failed to fit one could attempt to see if there were particular items that had different hardnesses across groups.

Given that our groups are the results of experimental treatments, however, we may instead begin by assuming identity of the distribution of the latent trait. Model 3 setsag=aandbg=b;

given the normalizationk3g= 1 for g = 1,2 this is equivalent to saying that there is only one

trait of overall prejudice, but the items (except for the third) can have different hardnesses depending on the target group.

Given model 1, model 2 must clearly be rejected as the difference in chi-square is significant (10.31 at 3 df, p = .016). Given model 1, however, loss of fit due to the constraints associated with model 3 is insignificant (chi-square of .62 at 2 df, p = .733). (The results here are not inde-pendent of the choice of item that is fixed; settingk3g= 1 for g = 1, 2 resulted in the lowest

(9)

Other Extensions

Generalization To Ordered Polychotomies

The generalization of the extended binomial error model to ordered polychotomies relies on a standard threshold parameterization for ordered categories that was developed in an IRT con-text by Edwards and Thurstone [21] and also for regression analysis of ordered categories (see, for example, [22–24]). Let j = 1,. . ._,_J_{represent the}_{a priori}_{order of the response categories for}

the ithitem where j = 1 is“low”and j =Jis“high”. We denote the hardness of each of these response categories aski,j, whereki,1<ki,2<. . ..<ki,J-1.By analogy withEq 1for dichotomous

responses, we write the probability of that a response fall into category jor higheras

Pr½x_i jjy ¼yki;j_: _ð₁₆_Þ

For afixedyvalue, the probabilities thus defined diminish as j increases. Given this definition, the probabilities of the intermediate response categories are calculated as differences between the probabilities associated with adjacent dichotomizations. This then implies that the model Table 3. 1994 Italian Survey: Stereotype Data (Sniderman, et al., 1995).

Item 1 Item 2 Item 3 Item 4 Immigrants are African Immigrants are East European

0 0 0 0 342 361

0 0 0 1 164 129

0 0 1 0 35 32

0 0 1 1 44 47

0 1 0 0 53 41

0 1 0 1 60 51

0 1 1 0 18 20

0 1 1 1 51 38

1 0 0 0 35 45

1 0 0 1 39 47

1 0 1 0 9 20

1 0 1 1 28 27

1 1 0 0 11 10

1 1 0 1 39 36

1 1 1 0 13 17

1 1 1 1 66 73

Totals 1007 994

doi:10.1371/journal.pone.0141981.t003

Table 4. Results of Fitting Models to Italian Data.

Model Parameters Constrained To Be Identical Across Groups Likelihood Ratio Chi-Sq df Probability

1 k3* 15.93 20 .721

2 k1,k2,k3*,k4 26.24 23 .290

3 k3_*,a,b 16.55 22 .788

4 k2,k3_*,a,b 18.56 23 .726

5 k2,k3*,k4,a,b 21.49 24 .610

6 k1,k2,k3*,k4,a,b 27.61 25 .326

*=ﬁxed parameter

(10)

for ordered categories can be written as follows:

Pr½xi¼jjy ¼y

ki;j _yki;jþ1 _ð₁₇_Þ

where we set the second term to zero forj=J.

The model given asEq 17has a number of welcome features. First, the probability function for any category j is single-peaked; if we denote its maximumyjthen (j<k),(yj<yk);y1

= 0 andyJ= 1. Finally, because this is a threshold model, it satisfies the two principles Jansen

and Roskam [25] called thejoining assumption(the probability of an aggregated response cate-gory is the sum of the probabilities of its constituent, usually adjacent, response categories) and

ξ-equivalence(the latent traits embedded in aggregated and disaggregated versions of the item

response functions are identical or are related by an admissable transformation). In contrast, the most methodologically tractable generalizations to the polychotomous case in the Rasch model (the rating scale and partial credit models and their relatives [26]) do not satisfy these criteria which means that a model that fits the polychotomous data may not fit dichotomous or otherwise collapsed data [27–30]. (The graded response model [31] does satisfy the collapsing conditions but is somewhat more complex.)

An Illustration of the Model for Ordered Polychotomies

To illustrate the application of the polychotomous model to social data, we use a small example drawn from the 2000 U.S. National Alcohol Survey, a cross-sectional national probability household survey on alcohol use and problems [32].Table 5shows the response distribution pertaining to a cross-classification of two 4-category items dealing with reasons for abstention from alcohol in a sample of 547 women aged 18–29. They were asked“how important to you are each of the following reasons for abstaining from alcohol beverages or being careful about how much you drink.”The two reasons represented inTable 6are“drinking is bad for your

Table 5. National Alcohol Survey Data.

Item 1 Item 2 Frequency

1 1 9

1 2 3

1 3 7

1 4 1

2 1 4

2 2 3

2 3 7

2 4 6

3 1 15

3 2 19

3 3 65

3 4 51

4 1 15

4 2 14

4 3 80

4 4 248

Totals 547

Response categories: 1‘Not a reason’, 2‘Not an Important Reason’, 3‘Somewhat Important Reason’, 4

(11)

health”and“drinking can get you sick”and the response categories for both reasons are“not a reason at all,” “not an important reason,” “somewhat important”and“very important.”In this illustration, the category response functions are written so that high values are associated with responses indicating the reasons stated are important for decisions to abstain or to be careful about drinking. Thusyk1,1is the conditional probability, giveny, that a respondent will regard the“drinking is bad for your health”as“an important reason”for abstaining or being careful about drinking.

Table 6shows the parameter estimates and standard errors for fitting the extended binomial error model for ordered polychotomies to these data. As judged by the likelihood ratio chi-squared value (12.78, 8df), the fit of the model to these data is acceptable. The beta distribution generated bya= 1.112 andb= 0.598 is skewed toward high values ofyimplying that the majority of respondents consider both reasons for abstinence to be important. A comparison of the category threshold parameters indicates that“bad for your health”is considered a more compelling reason for abstinence than“getting sick”among the young adult women in the national sample.

Single-Peaked Response Functions

This threshold formulation can also be adapted to model response processes in which subjects are more likely to give a positive response to some item if that item’s location is near their own on the latent trait. Following the method of Andrich and Luo [33–36], each item is turned into a trichotomy, with two failure regions (the item is too far above the subject for her to answer in a positive direction and the item is too far below her), with the intermediate leading to a posi-tive response. However, there is an alternaposi-tive approach that is sufficiently flexible to handle a variety of empirical cases.

Here we approximate the response function in question for some set of such dichotomous items as follows:

Pr½x_i¼1_jy_;_k

i ¼aiy

ki_ð1 _yki_Þ _ð₁₈_Þ

which is a unimodal function achieving its maximum whenyki= .5, which is equivalent toξ=

χiin an unbounded metric if we again deﬁneχɩ= log(ki), andξ= -log[-2log(y)]. The leadingα_i

may be considered akin to a discriminating ability parameter for cumulative models, or it may be considered an overall normalization factor if constrainedα_i=αfor all i; it may also beﬁxed

in advance, such that (for example) the predicted probability of a positive response is 1 whenξ Table 6. Results of Fitting Models to Data inTable 5.

Parameter Estimate Standard Error

k1,1 1.000* –

k1,2 0.132 0.021

k1,3 0.062 0.014

k2,1 1.565 0.177

k2,2 0.301 0.042

k2,3 0.141 0.025

a 1.112 0.194

b 0.598 0.097

*ﬁxed; Log-likelihood ratio chi-squared value = 12.78; df = 8; p>_0.173

(12)

=χi. Again, there is a parsimonious representation of the unconditional probability of any

vec-torx_{for any set of parameter values}h_{= (}_a_,_b_,α₁,. . .α_M,k₁,. . .,k_M) (seeAppendix A).

Conclusion

In sociology the loglinear Rasch model has become the most widely known and used IRT model in sociology [37–41]. There are two reasons for this popularity. First, Duncan [4] argued that its indifference to the distribution of the latent trait made it a better scientific instrument than the covariance-based methods that dominate social modeling. Second, the Rasch model’s parameters turned out to be estimable with a very simple loglinear approach [18,42]. This approach treats the distribution of the latent variable as a set of nuisance parameters—the total score preserves all useful information in grading respondents, and the item hardnesses can be estimated without further investigation of the distribution of the latent trait.

But preserving a metric representation of the trait aids the investigation of the response pro-cess. While other IRT models, including the Rasch model, can be adapted in this way, the model presented here allows for a wide-range of substantively important extensions to be modeled rather simply, due to the combination of a) a latent trait that can be interpreted as a probability, b) a simple family of response functions that can model dichotomous, ordered polychotomous, and unfolding-type responses, and c) a flexible two-parameter Beta distribution for the latent trait. Such a family of models can be particularly useful in the analysis of experiments that involve changes in wording, response formats, and item order.

Appendix A

A compact representation of the predicted response probabilities for any value of the parame-ters can be derived from the exclusion-inclusion theorem (see, e.g., [43], 72f). Let the vectorh

= (a,b,k1,. . .,kM) represent the model parameters and Pr[x|h] the unconditional probability of

a response for a givenh_{. For any}x_{, let}_I₍x_{) = {i:}_x_i_{= 0} and}_C₍x_{) = {i:}_x_i_{= 1). Let}_A_{be any subset}

ofI(x_{), including the null set}Ø_{and let |}_A_{| be the number of elements in}_A_{. Now since the}

responses to each item are assumed to be independent conditional on the value of the latent trait, we can write Pr[x_|h_{] in product form, where the expectation is taken with respect to the}

distribution of the latent trait:

Pr½xjh ¼Pr½x1;x2;. . .;x_N ¼E½Pr½x1;x2;. . .;x_Njy ¼E

YN

i¼1

Pr½x_ijy

" #

¼E Y

i2CðxÞ

Pr½x_i¼1_jyY

i2IðxÞ

ð1 _Pr_½x

i ¼1jyÞ

" #

¼E Y

i2CðxÞ

Pr½x_i¼1_jyX

AIðxÞ

ð 1_ÞjAjY

i2A

Pr½x_i ¼1_jy

" #

¼E X

AIðxÞ

ð 1_ÞjAj Y

i2A[CðxÞ

Pr½xi ¼1jy

" #

¼ X

AIðxÞ

ð 1_ÞjAj_E Y

i2A[CðxÞ

Pr½xi¼1jy

" #

ðA:1Þ

(13)

substitution:

E Y

i2A[CðxÞ

Pr½xi ¼1jy

" #

¼

i2A[CðxÞ kiÞ

ðA:2Þ

The response pattern score and score variance can be expressed similarly. Written in terms of the model parameters, the score E[y|x_{] is given as:}

E½yjx ¼

X

AIðxÞ ð 1_ÞjAj

G_ða_þ_bÞG_ða_þ1_þ X

i2A[CðxÞ kiÞ

G_ðaÞG_ða_þ_b_þ1_þ X

i2A[CðxÞ kiÞ

X

AIðxÞ ð 1_ÞjAj

i2A[CðxÞ kiÞ

: ðA:3Þ

Finally, there is a parsimonious representation of the unconditional probability of any vec-torx_{for any set of parameter values in the unfolding-type model of Eq 19. With}_I₍x_{) and}_C₍x₎

defined as above, andAany subset ofI(x_{), let}_B_{be any subset of the union of this}_A_with_C₍x_),

and let |B| be the number of elements inB. Then it can be shown that

P½xjh ¼ X

AIðxÞ

ð 1_ÞjAj Y

i2A[CðxÞ

a_i

! X

BfA[CðxÞg

ð 1_ÞjBj_{E y}

X

i2A[CðxÞ kiþ

X

i2B

ki ! 2 6 6 6 4 3 7 7 7 5 8 > > > < > > > : 9 > > > = > > > ;

ðA:4Þ

For the expectation involvingywe make the following substitution

E y

X

i2A[CðxÞ kiþ

X

i2B

ki ! 2 6 6 6 4 3 7 7 7 5 ¼

i2A[CðxÞ

k_iþX

i2B

k_iÞ

i2A[CðxÞ kiþ

X

i2B

kiÞ

ðA:5Þ

and algorithms related to those discussed above can be used to obtain simultaneous estimates of score and distribution parameters.

Appendix B

(14)

and5, these have following structure:

Pr xi¼1; xj ¼1

h i

¼E y kiþkj_¼

G_ða_þ_bÞG_ða_þ_k

iþkjÞ

G_ðaÞG_ða_þ_b_þ_k

iþkjÞ

ðB:1Þ

Pr½xi¼1 ¼E½y ki_¼

G_ða_þ_bÞG_ða_þ_k

iÞ

G_ðaÞG_ða_þ_b_þ_k_i_Þ ðB:2Þ

Pr½xj¼1 ¼E½y kj_¼

G_ða_þ_bÞG_ða_þ_k_j_Þ

G_ðaÞG_ða_þ_b_þ_k

jÞ

ðB:3Þ

These equations cannot be solved uniquely to obtaina,b,ki, andkj. However, if we setki = 1 (in effect, normalizing the model by settingy= Pr[xi = 1|y]) and useΓ(s+1) = sΓ(s), we get

Pr xi ¼1; xj¼1

h i

¼ GðaþbÞðaþkjÞGðaþkjÞ G_ðaÞða_þ_b_þ_k

jÞGðaþbþkjÞ

ðB:4Þ

Pr½xi ¼1 ¼

G_ða_þ_bÞaG_ðaÞ

G_ðaÞða_þ_bÞG_ða_þ_bÞ¼ a

ðaþbÞ ðB:5Þ

It follows that

Pr xi¼1;xj ¼1

h i

Pr xj ¼1

h i ¼

aþkj

aþbþkj

ðB:6Þ

and with manipulation we get

kj

aþb¼Sj ðB:7Þ

where

Sj ¼

Pr xi ¼1; xj¼1

h i

Pr½xi¼1Pr½xj¼1

Pr½xj¼1 Pr xi¼1;xj ¼1

h i ðB:8Þ

for all i and j. Thus, for each item other than the ith, its cross-classiﬁcation withxigives us its

ratio toa+b.

To construct an initial estimate ofa+b, which we will denoteΘfor brevity, we take advan-tage of the fact that fromEq B.3,

Pr xj¼1

h i

¼

G_ðY_ÞG Y _Pr_½_x

i ¼1 þSj

G

ð

Y

ð

_Pr_½_x_i_¼1

ÞÞ

G Y _ðS_j_þ1_Þ

ðB:9Þ

and hence

Pr½xj¼1

G_ðY_ÞG

ð

Y

ð

_Pr_½_x

i ¼1 þSjÞÞ

G

ð

Y

ð

_Pr_½_x

i¼1

ÞÞ

G

ð

Y

ð

Sjþ1

Þ

¼0 _ð_B_:₁₀_Þ

(15)

When the model describes the data perfectly, theM-1 equations corresponding to each j (j6¼i) should yield the same root. For actual data, we average our derived values ofΘ. We then substi-tute this value inEq B.7to get initial values ofkj, j6¼i. Our starting estimate of the parametera can be recovered from the product of (a+b) and Pr[xi = 1] (given that, by construction, ki = 1), and that ofbfromb=Θ—a.

Author Contributions

Analyzed the data: JAW JLM JB. Contributed reagents/materials/analysis tools: JAW. Wrote the paper: JAW JLM. Mathematical solutions: SJH. Identifiability checks: SJH. Algorithm: SJH JLM JAW. Programming: JLM JB.

References

1. Keats JA, Lord FM. A theoretical distribution for mental test scores. Psychometrika 1962; 27:59–62. 2. Lazarsfeld PF. Latent structure analysis and test theory. In: Lazarsfeld PF Henry NW, editors. Readings

in mathematical and social science. Cambridge, Mass .: MIT Press; 1966. p. 78–88. 3. Coleman JS. Introduction to mathematical sociology. Glencoe, Ill.: Free Press; 1964.

4. Duncan OD. Rasch measurement in survey research: Further examples and discussion. In: Turner CF and Martin E, editors. Surveying subjective phenomena, Vol. II. New York: Russell Sage Foundation; 1984. p. 367–404.

5. Keats JA. Some generalizations of a theoretical distribution of mental test scores. Psychometrika 1964; 29:215–231.

6. Lord RM, Novick MR. Statistical theories of mental test scores. Reading, Mass.: Addison-Wesley; 1968.

7. Huynh H. Error rates in competency testing when test retaking is permitted. J of Ed Stats 1990; 15: 39–

52.

8. Wilcox RR. A review of the beta-binomial model and its extensions. J of Ed Stats 1981; 6:3–32. 9. Polak E, Ribière G. Note sur la convergence de directions conjuguée. Rev. Francaise Informat

Recher-che Operationelle 1969; 16: 35–43.

10. Fletcher R, Reeves CM. Function minimization by conjugate gradients. Comput J., 1964; 7: 149–154. 11. Nelder JA, Mead R. A simplex method for function minimization. Comput J 1965; 7: 308–313. 12. Wald A. Note on the identification of economic relations. In: Koopmans TC, editor. Statistical inference

in dynamic economic models. New York: Wiley; 1950. p. 305–310.

13. Rasch G. Probabilistic models for some intelligence and attainment tests. Copenhagen: The Danish institute of educational research; 1960.

14. Raftery A. A note on Bayes factors for log-linear contingency table models with vague prior information. JRSS Ser. B 1985; 48: 249–250.

15. Weakliem DL. A critique of the Bayesian information criterion for model selection. Sociological Methods and Research 1999; 27:359–397.

16. Raftery A. Bayes factors and BIC. Sociological Methods and Research 1999; 27:411–427.

17. Stouffer SA, Guttman L, Suchman EA, Lazarsfeld PF, Star SA, Clausen JA. Measurement and predic-tion. Princeton: Princeton University Press; 1950.

18. Kelderman H. Loglinear Rasch model tests. Psychometrika 1984; 49: 223–245.

19. Cressie N, Holland PW. Characterizing the manifest probabilities of latent trait models.”Psychometrika 1983; 48:129–141.

20. Sniderman P, Piazza T, Peri P, Schizzerotto A. Codebook for the 1994 survey on regional and ethnic prejudice in Italy. Berkeley: Survey Research Center, University of California; 1994.

21. Edwards AL, Thurstone LL. An internal consistency check for scale values determined by the method of successive integers.”Psychometrika 1952; 17:169–180.

22. Bock RD. Multivariate statistical methods in behavioral research. New York: McGraw-Hill; 1975. 23. Cox DR Analysis of binary data. London: Chapman and Hall Ltd; 1970.

(16)

25. Jansen PGW, Roskam EE. Latent trait models and dichotomization of graded responses. Psychome-trika 1986; 51:69–91.

26. Thissen D, Steinberg L. A taxonomy of item response models. Psychometrika 1986; 51:567–577. 27. Rasch G. An individualistic approach to item analysis. In: Lazarsfeld PF Henry NW, editors. Readings

in mathematical and social science. Cambridge, Mass.: MIT Press; 1966. p. 89–107.

28. Andrich D. Models for measurement, precision, and the nondichotomization of graded responses. Psy-chometrika 1995; 60:7–26.

29. Andrich D. Further remarks on nondichotomization of graded responses. Psychometrika 1995; 60: 37–

46.

30. Andrich D. Distinctive and incompatible properties of two common classes of IRT models for graded responses. Applied Psychological Measurement 1995; 19: 101–119.

31. Roskam EE. Graded responses and joining categories: A rejoinder to Andrich’s‘Models for measure-ment, precision, and the nondichotomization of graded responses.’Psychometrika 1995; 60, 27–35. 32. Greenfield TK, Nayak MB, Bond J, Ye Y, Midanik LT. Maximum quantity consumed and alcohol-related

problems: assessing the most alcohol drunk with two measures. Alcoholism: clinical and experimental research 2006; 30, 1576–1582.

33. Andrich D, Luo G. A hyperbolic cosine latent trait model for unfolding dichotomous single-stimulus responses. Appl Psych Meas 1993; 17: 253–176.

34. Andrich D. A hyperbolic cosine latent trait model for unfolding polytomous responses: Reconciling Thur-stone and Likert methodologies. Brit J of Math and Stat Psych 1996; 49: 347–365.

35. Luo G. A general formulation for unidimensional unfolding and pairwise preference models. J Math Psych 1998; 42:400–417.

36. Luo G. A class of probabilistic unfolding models for polytomous responses. J Math Psych 2001; 45:224–248.

37. MacIntosh R. Global attitude measurement: An assessment of the world values survey postmaterialism scale. Am Soc Rev 1998; 63: 452–464.

38. Hagan J, Foster H. Youth violence and the end of adolescence. Am Soc Rev 2001; 66: 874–899. 39. Hout M, Greeley AM. The center doesn't hold: Church attendance in the United States, 1940–1984. Am

Soc Rev 1987; 52: 325–345.

40. Browning CR, Leventhal T, Brooks-Gunn J. Sexual initiation in early adolescence: The nexus of paren-tal and community control. Am Soc Rev 2005; 70: 758–778.

41. Oegema D, Klandermans B. Why social movement sympathizers don't participate: Erosion and non-conversion of support. Am Soc Rev 1994; 59: 703–722.