Beta-binomial/gamma-poisson regression models for repeated counts with random parameters

(1)

2011, Vol. 25, No. 2, 218–235 DOI:10.1214/10-BJPS118

©Brazilian Statistical Association, 2011

Beta-binomial/gamma-Poisson regression models for

repeated counts with random parameters

Mayra Ivanoff Loraaand Julio M. Singerb

a_{Escola de Economia de São Paulo – Fundação Getulio Vargas} b_{Universidade de São Paulo}

Abstract. Beta-binomial/Poisson models have been used by many authors to

model multivariate count data. Lora and Singer [Stat. Med. 27 (2008) 3366– 3381] extended such models to accommodate repeated multivariate count data with overdipersion in the binomial component. To overcome some of the limitations of that model, we consider a beta-binomial/gamma-Poisson alternative that also allows for both overdispersion and different covariances between the Poisson counts. We obtain maximum likelihood estimates for the parameters using a Newton–Raphson algorithm and compare both models in a practical example.

1 Introduction

Beta-binomial models have been used by many authors to model binomial count data with different probabilities of success among units from the same group of study. Williams (1975) used such distributions to compare the number of fetal abnormalities of pregnant rat females on a chemical diet during pregnancy to a control group, both with fixed litter size. Gange et al. (1996) analyzed the quality of health services (classified as appropriate or not) during patient stay in a hospital using a similar approach. To analyze mortality data in mouse litters with a fixed number of implanted fetuses, Brooks et al. (1997) used such models not only to allow for different probabilities of success among units from the same group of study, but also to consider overdispersion among them. Given that in many studies, the number of trials may not be fixed, Comulada and Weiss (2007) considered a multivariate Poisson distribution to model the number of successes and failures in a random number of attempts, illustrating their proposal with data from a HIV transmission study. Multivariate Poisson distribution have also been used to model correlated count data, as in Karlis and Ntzoufras (2003) who used such distribution to model the number of goals of two competing teams.

In a study where the number of successes in a random number of trials was observed repeatedly, and therefore are possibly correlated, Lora and Singer (2008) consider multivariate binomial/Poisson models. In their proposal, the beta-binomial component also accounts for overdispersion across units with the same Key words and phrases. Bivariate counts, longitudinal data, overdispersion, random effects,

re-gression models.

Received October 2009; accepted March 2010. 218

(2)

levels of covariates. The multivariate Poisson component accommodates both the random number of trials and the repeated measures nature of the data. The effect of possible covariates is taken into account via the regression approach suggested by Ho and Singer (1997, 2001). Their model, however, requires a constant covariance term between the repeated number of trials and does not allow for overdispersion in these counts. Since, as suggested by Cox (1983), the precision of parameter estimates may be seriously affected when overdispersion is not accounted for in the models considered for analysis, we propose a beta-binomial/gamma-Poisson model that not only incorporates such characteristics but is also easier to imple-ment computationally. The model, along with maximum likelihood methods for estimation and testing purposes are presented in Section 2. An illustration using data previously analyzed by Lora and Singer (2008) is presented in Section 3. A brief discussion and suggestions for future research are outlined in Section4.

2 The beta-binomial/gamma-Poisson model for repeated measurements

We denote the vector of responses for the gth sample unit (g= 1, . . . , M) by Yg= (Xg1, Ng1, . . . , Xgp, Ngp)

with Xghcorresponding to the number of successes in Nghtrials performed under

the hth (h= 1, . . . , p) observation condition. We assume that for all g and h,

Xgh| Ngh, πghfollow independent binomial(Ngh, πgh)distributions, (2.1)

πghfollow independent

(2.2) Betaμ(zμgh)/θ (zθgh),[1 − μ(zμgh)]/θ(zθgh)

distributions,

Ngh| τg follow independent Poisson(λ(zλgh)τg)distributions, (2.3)

τg follow independent gamma

α(zαg)/δ(zδg),1/δ(zδg)

distributions, (2.4) where zμgh, zθgh, zλgh, zαg and zδgare vectors of fixed covariates.

According to (2.1) and (2.2), the success probabilities may be different across units, but they are generated by beta distributions that may depend on covariates. In (2.3) and (2.4), we follow Nelson (1985) to specify that the numbers of trials may also be different across units, but are generated by gamma distributions that may also depend on covariates.

The parametrizations (0 < μ < 1, θ > 0) adopted in (2.2) and (α > 0, δ > 0) adopted in (2.4) are used to facilitate maximum likelihood estimation, as suggested by Gange et al. (1996); their relation to the usual beta(a, b) parametrization, as in

(3)

Johnson and Kotz (1970), and the usual gamma(c, d) parametrization, as in Mood et al. (1974), is given by μ= a a+ b, θ= 1 a+ b, α= c d and δ= 1 d.

The first and second-order central moments of τg in (2.4) are

E(τg)= α(zαg), (2.5)

Var(τg)= α(zαg)δ(zδg). (2.6)

From (2.3) and (2.4), the first and second-order central moments of the number of trials are

E(Ngh)= λ(zλgh)α(zαg), (2.7)

Var(Ngh)= λ(zλgh)α(zαg){1 + λ(zλgh)δ(zδg)}, (2.8)

Cov(Ngh, Ngh)= λ(zλgh)λ(zλgh)α(zαg)δ(zδg) (2.9)

for all g, h, h, h= h. Similarly, the first and second-order central moments of πgh

in (2.2) are

E(πgh)= μ(zμgh), (2.10)

Var(πgh)= μ(zμgh)[1 − μ(zμgh)]θ(zθgh)[1 + θ(zθgh)]−1. (2.11)

Also, from (2.1) and (2.2), we may conclude that, for all g and h,

Xgh| Ngh∼ beta − binomial[Ngh, μ(zμgh), θ (zθgh)] with E(Xgh)= μ(zμgh)λ(zλgh)α(zαg), (2.12) Var(Xgh)= μ(zμgh)[1 − μ(zμgh)] θ (zθgh) 1+ θ(zθgh) × λ2_(z λgh)α(zαg)[α(zαg)+ δ(zδg)] (2.13) + μ(zμgh)λ(zλgh)α(zαg)[1 + μ(zμgh)λ(zλgh)δ(zδg)], Cov(Xgh, Xgh)= μ(zμgh)μ(zμgh)λ(zλgh)λ(zλgh)α(zαg)δ(zδg) (2.14)

for all g, h, h, h= h. The covariance between the numbers of successes and trials is

Cov(Xgh, Ngh)= μ(zμgh)λ(zλgh)α(zαg){1 + λ(zλgh)δ(zδg)}. (2.15)

The parameters θ (zθgh)govern both the variability of the success probabilities and

the overdispersion of the number of successes, that may also depend on the param-eter δ(zδg). When θ (zθgh)and δ(zδg)are equal to zero, there is no overdispersion

(4)

for the number of successes. The parameters δ(zδg)are also related to the

variabil-ity and overdispersion of the number of trials and to the covariance between the numbers of trials and numbers of successes. When δ(zδg)= 0, the repeated counts

are independent.

To investigate the effects of covariates, we adopt log-linear models of the form

μ(zμgh)= exp(z_μghβ_μ) 1+ exp(z_μghβ_μ), (2.16) θ (zθgh)= exp(zθghβθ), (2.17) λ(zλgh)= exp(zλghβλ), (2.18) α(zαg)= exp(zαgβα), (2.19) δ(zδg)= exp(zδgβδ), (2.20)

where β_μ, β_θ, β_λ, β_α and β_δare vectors of parameters to be estimated.

From (2.1), (2.2), (2.3) and (2.4) it follows that the joint probability mass func-tion for the number of trials and successes for the gth unit is

P (Xg1, Ng1, . . . , Xgp, Ngp) = p h=1 P (Xgh|Ngh)P (Ng1, . . . , Ngp) = p h=1 P (Xgh|Ngh) _∞ 0 p h=1 P (Ngh|τg)f (τg) dτg

with f denoting the density of (2.4). Since the logarithm of the likelihood is given by log L(β_μ, β_θ, β_λ, β_α, β_δ) = M g=1 p h=1 log P (Xgh|Ngh, βμ, βθ) + M g=1 log P (Ng1, . . . , Ngp|βλ, βα, βδ),

the parameters of the beta-binomial distribution (βμ,βθ) can be estimated

sepa-rately from those of the gamma-Poisson distribution (β_λ, β_α, β_δ). The beta-binomial probability mass function can be written as

P (Xgh= xgh|Ngh= ngh, βμ, βθ) = ngh xgh ₁ θ (zθgh) ₁ θ (zθgh)+ ngh −1 (2.21)

(5)

× _μ(z μgh) θ (zθgh) + xgh _μ(z μgh) θ (zθgh) −1 × ₁_{− μ(z} μgh) θ (zθgh) + n gh− xgh ₁_{− μ(z} μgh) θ (zθgh) ₋₁ = ngh xgh ngh−1 u=0 [1 + uθ(zθgh)]−1 xgh−1 v=0 [μ(zμgh)+ vθ(zθgh)] × ngh−xgh−1 w=0 [1 − μ(zμgh)+ wθ(zθgh)],

where (r) = ₀∞tr−1e−tdt. The expressions involving ratios between two gamma functions (presented within brackets) make sense when ngh= 0 (in the

first ratio), xgh= 0 (in the second ratio) and xgh= ngh (in the third ratio). When

these conditions are not satisfied, the ratios between the gamma functions may be set equal to one, and do not affect the conditional probability of Xgh given

Ngh, βμ, βθ.

The kernel of the beta-binomial log-likelihood function is

L(βμ, βθ)= M g=1 p h=1 xgh−1 v=0 log[μ(zμgh)+ vθ(zθgh)] + ngh−xgh−1 w=0 log[1 − μ(zμgh)+ wθ(zθgh)] (2.22) − ngh−1 u=0 log[1 + uθ(zθgh)]

and we may use maximum likelihood methods adopting a Newton–Raphson it-erative process to estimate βμand βθ.The first and second derivatives of (2.22)

are shown in Lora and Singer (2008). Method of moments estimates based on the beta-binomial distribution may be used as initial values for μ(zμgh) and θ (zθgh),

as suggested by Griffiths (1973). Likelihood ratio tests may be employed for model reduction purposes, that is, for constructing a parsimonious model that captures the explainable variability in the data. For example, to verify if the q-parameter vector

β∗is null, the test statistics LR= 2(L−L∗), with L∗indicating the log-likelihood under H0and L, this logarithm under the alternative hypothesis may be employed. Asymptotically, LR follows a chi-squared distribution with q degrees of freedom under the null hypothesis.

(6)

The probability function for the repeated number of trials based in (2.3) and (2.4) is P (Ng1= ng1, . . . , Ngp= ngp|βλ, βα, βδ) = p h=1 _[λ(z λgh)]ngh ngh! ₁ δ(zδg) α(zαg)/δ(zδg) ×  _p h=1 ngh+ α(zαg) δ(zδg) _α(z αg) δ(zδg) −1 (2.23) ÷ p h=1 λ(zλgh)+ 1 δ(zδg) _h=1p ngh+α(zαg)/δ(zδg) = p h=1 _[λ(z λgh)]ngh ngh! _hp₌₁ngh−1 u=0 [α(zαg)+ uδ(zδg)] ÷ δ(zδg) _p h=1 λ(zλgh) + 1 _h=1p ngh+α(zαg)/δ(zδg) .

In (2.23), the simplifications for the rations between two gamma functions make sense whenp_h₌₁ngh= 0. When this condition is not satisfied, the ratio is also set

equal to one, and it does not affect the probability value. The kernel of the gamma-Poisson log-likelihood function is

L(β_λ, β_α, β_δ)= M g=1 p h=1 [nghlog λ(zλgh)] + _h=1pngh−1 u=0 log[α(zαg)+ uδ(zδg)] (2.24) − _p h=1 ngh+ α(zαg) δ(zδg) log δ(zδg) _p h=1 λ(zλgh) + 1

and we adopt the same methods used with the beta-binomial model to estimate

β_λ, β_α and β_δ. The first and second derivatives of (2.24) are shown at the Ap-pendix. Method of moments estimates may be used as the initial values for λgh(zλ),

αg(zα) and δ(zδ) here, too. Likelihood ratio tests may be employed for model

reduction purpose, along similar lines as those considered for the beta-binomial model.

Both iterative processes are implemented in the R software and the correspond-ing code can be downloaded fromhttp://www.ime.usp.br/~jmsinger.

(7)

3 Data analysis

To compare the beta-binomial/gamma-Poisson to the multivariate beta-binomial/ Poisson model, we consider the same data presented in Lora and Singer (2008) from a study conducted at the Learning Laboratory of the Department of Physio-therapy, Phonotherapy and Occupational Therapy of the University of São Paulo, Brazil, to evaluate the performance of some motor activities of Parkinson’s disease patients. For the sake of completeness, we repeat the description of the study here. Twenty-five patients with confirmed clinical diagnosis of Parkinson’s disease and 21 normal (without any preceding neurologic alterations) subjects repeated two sequences of specified opposed finger movements (touching one of the other four fingers with the thumb) during one minute periods, with both hands. This was done both before and after a four-week experimental period in which only one of the se-quences was trained (active sequence) with one of the hands; the other sequence was not trained (control sequence). Half of the subjects in each group trained the preferred hand (right for the right-handed and left for the left-handed in the normal group or the less affected by the disease in the experimental group) and the other half trained the nonpreferred hand. Information on the number of attempted and successful trials were recorded with a special device attached to a computer.

Six subgroups may be characterized by the combination of disease stage (nor-mal, initial or advanced) and use of the preferred hand (yes or no). The repeated measures are characterized by the cross-classification of the levels of sequence (control or active) and evaluation session (baseline or final). The specific objective of the study was to evaluate whether training is associated with increases in the expected number of attempted trials per minute (agility) and/or on the probability of successful trials (ability). Note that the treatment could improve agility without improving ability, so an evaluation of its effect on both characteristics is important. The means and variances of the number of attempted and successful trials at the baseline and final evaluations with the active and control sequences for patients at the different disease stages using the preferred or nonpreferred hands are pre-sented in Table1. Variances, instead of standard deviations, are displayed to facili-tate identification of overdispersion in the sense referred by Nelder and McCullagh (1989), that is, cases where variances are greater than expected under Poisson or binomial distributions. Overdispersion in the number of attempts, under a Poisson distribution is clearly identified by comparing the observed mean and variance; for the number of successes, on the other hand, it is necessary to compare the observed and expected variances under the binomial distribution (np(1− p)). For example, considering normal subjects performing the active sequence at the baseline ses-sion using the preferred hand, the expected variance under the binomial model is 1.4, while the observed variance is 49.0, highlighting the overdispersion for these counts too.

Correlation coefficients for the within-subject responses for the normal patients using the preferred hand are displayed in Table 2. For this subgroup, only 3 out

(8)

Table 1 Mean and variance (within parentheses) of the number of attempted and successful trials

Disease Evaluation Intervention

stage session hand Sequence Successes Attempts

Normal Baseline Preferred Control 17.1 (49.0) 18.6 (46,2)

Normal Baseline Preferred Active 17.1 (72.3) 17.9 (79.2)

Normal Baseline Non-preferred Control 18.1 (27.0) 20.9 (47.6)

Normal Baseline Non-preferred Active 17.1 (37.2) 19.5 (53.3)

Normal Final Preferred Control 20.9 (90.3) 26.1 (44.9)

Normal Final Preferred Active 32.7 (139.2) 33.1 (132.3)

Normal Final Non-preferred Control 24.2 (25.0) 28.6 (38.4)

Normal Final Non-preferred Active 32.8 (74.0) 34.4 (72.3)

Initial Baseline Preferred Control 13.7 (24.0) 16.3 (44.9)

Initial Baseline Preferred Active 12.0 (23.0) 13.5 (23.0)

Initial Baseline Non-preferred Control 12.0 (17.6) 14.6 (9.0) Initial Baseline Non-preferred Active 10.7 (20.3) 13.6 (10.9)

Initial Final Preferred Control 13.2 (30.3) 16.8 (43.6)

Initial Final Preferred Active 20.2 (9.6) 21.8 (2.9)

Initial Final Non-preferred Control 15.3 (112.4) 20.3 (116.6)

Initial Final Non-preferred Active 20.1 (33.6) 20.4 (39.7)

Advanced Baseline Preferred Control 4.8 (22.1) 7.1 (11.6)

Advanced Baseline Preferred Active 4.6 (11.6) 7.9 (14.4)

Advanced Baseline Non-preferred Control 8.3 (72.3) 12.5 (15.2) Advanced Baseline Non-preferred Active 13.5 (92.2) 15.5 (57.8)

Advanced Final Preferred Control 7.4 (75.7) 11.9 (67.2)

Advanced Final Preferred Active 13.5 (90.3) 14.9 (77.4)

Advanced Final Non-preferred Control 5.8 (31.4) 12.8 (12.3)

Advanced Final Non-preferred Active 22.5 (75.7) 23.8 (75.7)

Table 2 Correlation coefficients for the within-subject responses for the normal subjects using the preferred hand

Baseline session Final session

Active seq. Control seq. Active seq. Control seq.

Suc. Att. Suc. Att. Suc. Att. Suc. Att.

Baseline Active Suc. 1

session seq. Att. 0.99 1

Control Suc. 0.85 0.84 1

seq. Att. 0.78 0.80 0.96 1

Final Active Suc. 0.76 0.76 0.61 0.61 1

session seq. Att. 0.74 0.74 0.61 0.63 0.99 1

Control Suc. 0.53 0.49 0.59 0.63 0.60 0.61 1

seq. Att. 0.81 0.82 0.70 0.69 0.93 0.92 0.50 1

(9)

of the 28 observed correlations are smaller than 0.60; this suggests that the counts are probably related and it is sensible to use a model that can accommodate this relationship. The correlation patterns for the other subgroups are similar and are not presented.

The analysis strategy consisted in fitting initial models of the form (2.16)– (2.20) with all main effects and first order interactions, and trying to reduce them by sequentially eliminating the nonsignificant terms. The parameters are in-dexed by disease stage (0= normal, 1 = initial, 2 = advanced), intervention hand (P= preferred, N = nonpreferred), evaluation session (B = baseline, F = final) and sequence (C= control, A = active). We adopted a reference cell parameteri-zation with the reference cell corresponding to the normal group (0), performing the active sequence (A) with the preferred hand (P) at the baseline evaluation (B). 3.1 Modelling the expected probability and dispersion of successful attempts For both beta-binomial/gamma-Poisson and multivariate beta-binomial/Poisson models, the parameters of the beta-binomial components can be estimated sepa-rately from those of the gamma-Poisson or the multivariate Poisson distributions. Therefore, modelling the expected probabilities and dispersion parameters of the successful attempts is exactly the same as in Lora and Singer (2008) and it is not shown here; we present only the estimates and standard errors computed under the final beta-binomial model (Table3) for comparison with the results obtained under the beta-binomial/gamma-Poisson model. Under this final model, estimates of the expected probabilities of successful attempts [E(πgh)= μ(zμgh)] and dispersion

parameters θ (zθgh) (that govern the variability of the probabilities of successful

attempts), along with their standard errors, are presented in Table4.

The results suggest no evidence of difference between the expected probabilities of successful attempts for patients using preferred or nonpreferred hand (βμN = 0),

Table 3 Parameter estimates and standard errors under the final beta-binomial model

Parameter Related to Estimate Standard error

βμ0 Normal group, preferred hand, 1.86 0.15

baseline session and active sequence

βμ2 Effect of advanced stage −1.35 0.25

βμF Effect of final session 1.38 0.30

βμ(F∗C) Effect of final session and control sequence −1.79 0.30

βθ0 Normal group, preferred hand, −1.07 0.27

baseline session and active sequence

βθ1 Effect of initial stage −2.98 1.05

βθ2 Effect of advanced stage 1.31 0.37

βθ (1∗F ) Effect of disease in initial stage and final session 1.66 0.82 βθ (1_∗N) Effect of initial stage and nonpreferred hand 2.78 0.91 βθ (F∗N) Effect of final session and nonpreferred hand −1.49 0.44

(10)

Table 4 Estimates of expected probabilities of successful attempts, dispersion parameters and stan-dard errors under the final beta-binomial model

Disease Evaluation Intervention Expected Standard

stage session hand Sequence value error

Expected probabilities of successful attempts

Normal or initial Baseline Either Either 0.87 0.02

Normal or initial Final Either Control 0.81 0.03

Normal or initial Final Either Active 0.96 0.01

Advanced Baseline Either Either 0.62 0.06

Advanced Final Either Control 0.52 0.06

Advanced Final Either Active 0.87 0.04

Dispersion parameters

Normal Baseline Either Either 0.34 0.09

Normal Final Preferred Either 0.34 0.09

Normal Final Nonpreferred Either 0.08 0.03

Initial Baseline Preferred Either 0.02 0.02

Initial Baseline Nonpreferred Either 0.28 0.14

Initial Final Preferred Either 0.09 0.06

Initial Final Nonpreferred Either 0.33 0.19

Advanced Final Preferred Either 1.27 0.37

Advanced Final Nonpreferred Either 0.29 0.13

neither for active nor for control sequences in the baseline session (βμC = 0).

Pa-tients in the normal group or with the disease in initial stage have similar expected probabilities of successful attempts (βμ1= 0), but those with the disease in an ad-vanced stage have smaller expected probabilities of successful attempts (βμ2<0). Moreover, an intervention effect is detected since the expected probabilities of suc-cessful attempts in the final session are greater than those for the baseline session (βμF >0). These values are smaller for the control sequence than for the active

sequence (βμF + βμ(F∗C)<0) suggesting that training is effective with respect to

ability.

We may also infer that there is no difference between the expected dispersion parameter for subjects performing the active and control sequences (βθ C= 0). For

the normal subjects, the expected dispersion parameters are the same (βθ C, βθ N,

βθ F=0), except in the final evaluation using the nonpreferred hand, for which the

expected value is smaller than the others (βθ (F∗N)<0). For patients in initial stage

of the disease, the expected dispersion parameters are smaller than for those in the normal group (βθ1 <0); however, they change for each combination of session and intervention hand (βθ (1∗F ), βθ (1∗N), βθ (F∗N)= 0). Finally, for patients in the

advanced stage of the disease, the expected dispersion parameter is larger than for those in the normal group (βθ2>0), but this changes for the final session when the nonpreferred hand is used (βθ (F∗N)= 0).

(11)

3.2 Modelling the expected number of attempts

The initial model parameter vector, with all main effects and first order interactions is β= (β_λ, β_α, β_δ)where βλ= βλ0, βλ1, βλ2, βλN, βλF, βλC, βλ(1∗F ), βλ(1∗N), βλ(1∗C), βλ(2∗F ), βλ(2∗N), βλ(2∗C), βλ(F∗N), βλ(F∗C), βλ(N∗C) , β_m=βm0, βm1, βm2, βmN, βm(1∗N), βm(2∗N)

with m= α, δ. We may interpret βλ0as the logarithm of λ for normal individuals, using the preferred hand, performing the active sequence at the final evaluation;

βλN corresponds to the variation in the logarithm of λ due to the effect of the

non-preferred hand compared to the non-preferred one; βλ(1∗N)corresponds to an additional variation in the logarithm of λ due to the interaction between the initial stage of the disease (2.1) and the use of the nonpreferred hand (N ). The elements of the vector βλ related to different evaluation sessions (represented by F and C) allow

for different number of attempts in these different evaluation sessions. On the other hand, α(zαg)and δ(zδg)do not vary in different evaluation sessions; therefore the

vectors βα and βδ do not have elements to distinguish between sessions, but have

elements to compare subgroups.

As noticed in Lora and Singer (2008) for the beta-binomial model, the itera-tive process was very sensiitera-tive to initial values, specially for the interactions. To overcome this difficulty, we started with a simpler model containing only the main effects and used the resulting estimates as initial values for fitting other models, obtained by including the interactions one by one. The estimates of the interaction parameters obtained in this preliminary process were used as the initial values in our modelling strategy.

The nonsignificant interactions were identified and their simultaneous elimina-tion from the initial model was supported (p= 0.211) via a test of the hypothesis

H0: βλ(1∗F ), βλ(1∗N), βλ(1∗C), βλ(2∗F ), βλ(2∗N), βλ(2∗C), βλ(F∗N), βλ(N∗C),

βα(1∗N), βα(2∗N), βδ(1∗N), βδ(2∗N)= 0.

Under the resulting reduced model, the nonsignificant main effects were identi-fied; their simultaneous elimination was corroborated (p= 0.493) via a test of the hypothesis

H0: βλN, βλC, βα1, βα2, βαN, βδ1, βδ2, βδN = 0.

We considered other hypotheses where some of these parameters are equal to zero and they were all rejected (p < 0.150). Goodness of fit of the resulting reduced model was confirmed by a likelihood ratio test in which it was compared to the initial model (p= 0.289).

For this final model, the corresponding parameter estimates along with their standard errors are presented in Table 5. Based on this, we estimated expected

(12)

Table 5 Parameter estimates and standard errors for the final gamma-Poisson model

Parameter Related to Estimate Standard error

βλ0 Normal group, preferred hand, 1.68 0.03

initial evaluation and active sequence

βλ1 Effect of initial stage −0.38 0.05

βλ2 Effect of advanced stage −0.71 0.05

β_λF Effect of final evaluation 0.52 0.04

βλ(F∗C) Effect of final evaluation and control sequence −0.22 0.05

βα0 Normal group, preferred hand 1.30 0.05

βδ0 Normal group, preferred hand −1.32 0.25

Table 6 Estimates of expected values of λ(zλgh)

Disease Evaluation Intervention Sequence Expected Standard

stage session hand value error

Normal Baseline Either Either 5.4 0.2

Normal Final Either Control 7.2 0.3

Normal Final Either Active 9.0 0.4

Initial Baseline Either Either 3.7 0.2

Initial Final Either Control 5.0 0.4

Initial Final Either Active 6.2 0.3

Advanced Final Either Control 3.6 0.2

Advanced Final Either Active 4.4 0.3

values for λ(zλgh); the results are presented in Table6. Additionally, since only the

parameters βα0and βδ0were included at the final model, we have α(zαg)= 3.67,

with standard error of 0.18, and δ(zδg)= 0.27, with standard error of 0.07, for all

disease stages and both hands. The nonzero estimate of δ suggests that the total attempts are overdispersed and that the correlations among the counts across the different instants of evaluation are nonnull.

We may conclude that individuals in the initial stage of the disease have smaller expected number of attempts than normal ones, and for individuals in the advanced stage this value is even smaller (βλ2< βλ1<0 and βα1= βα2= 0). There is no evidence of difference between the expected number of attempts for participants using preferred or nonpreferred hands (βλN = 0 and βαN = 0), neither for active

nor for control sequences in the baseline session (βλC = 0). The results suggest

that the training is also effective with respect to agility, since the expected number of attempts under the final evaluation is bigger than at the initial one (βλF >0).

(13)

Table 7 Estimates and standard errors (within parentheses) for the expected number of successful and total attempts under the final beta-binomial/gamma-Poisson model

Disease Evaluation Intervention Sequence Successful Total

stage session hand attempts attempts

Normal Baseline Either Either 17.2 (1.0) 19.8 (1.1)

Normal Final Either Control 21.4 (0.8) 26.4 (0.1)

Normal Final Either Active 31.7 (1.8) 33.0 (1.8)

Initial Baseline Either Either 11.8 (0.7) 13.6 (0.8)

Initial Final Either Control 14.9 (1.1) 18.4 (1.2)

Initial Final Either Active 21.9 (1.4) 22.8 (1.4)

Advanced Baseline Either Either 5.2 (0.6) 8.4 (0.6)

Advanced Final Either Control 6.9 (0.9) 13.2 (0.9)

Advanced Final Either Active 14.0 (1.2) 16.1 (1.1)

the final evaluation compared with the initial one (βλF + βλ(F∗C)>0); however,

considering only the final evaluation, the expected number of attempts is larger for the active sequences than for the control ones (βλ(F∗C)<0).

Table7contains estimates of the expected successful and total attempts along with the respective standard errors. In Table8we present estimates (with respective standard errors) of the elements of the covariance matrix for normal subjects using the preferred hand. Covariance patterns for the other subgroups are similar and are not included.

4 Discussion

The proposed beta-binomial/gamma-Poisson model is more general than the mul-tivariate beta-binomial/Poisson model considered in Lora and Singer (2008) be-cause it allows for different covariances between the number of attempts in dif-ferent evaluation sessions and considers a possible overdispersion of the total at-tempts. Moreover, the gamma-Poisson component of the model is computationally much easier to use for comparisons among the numbers of attempts in different evaluation sessions.

While in the multivariate beta-binomial/Poisson model, the multivariate Pois-son component requires a different set of parameters for each evaluation session, in the beta-binomial/gamma-Poisson model, the gamma-Poisson component in-cludes a single set of parameters for all evaluation sessions. To compare the ex-pected number of attempts under different conditions using the former, it is neces-sary rewrite the model and to derive ad hoc estimating equations while under the latter, it suffices to eliminate the corresponding regression parameter and to obtain new parameter estimates using the same estimating equations. For the analyzed data, for example, the comparison between the control and active sequence during

(14)

Table 8 Estimates and standard errors (within parentheses) for the expected covariance matrix for normal subjects using the preferred hand

Baseline session Final session

Active seq. Control seq. Active seq. Control seq.

Suc. Att. Suc. Att. Suc. Att. Suc. Att.

Baseline Active Suc. 51.2

session seq. (7.2) Att. 42.2 48.5 (11.4) (13.0) Control Suc. 21.5 0 51.2 seq. (5.8) (7.2) Att. 0 28.7 42.4 48.5 (7.8) (11.4) (13.0)

Final Active Suc. 40.3 0 40.3 0 118.9

session seq. (10.8) (10.8) (22.2) Att. 0 48.4 0 48.4 110.5 115.1 (13.0) (13.0) (30.0) (31.2) Control Suc. 27.3 0 27.3 0 52.3 0 87.4 seq. (7.3) (7.3) (13.8) (12.9) Att. 0 39.0 0 39.0 0 65.8 64.9 80.1 (10.5) (10.5) (17.7) (17.8) (21.8)

Codes: Suc.= Successes, Att. = Attempts and seq. = sequence.

the baseline evaluation using the beta-binomial/gamma-Poisson model is done by testing if the parameter βλC is null. On the other hand, under the multivariate

beta-binomial/Poisson approach, the total number of trials is modelled with a specific vector of parameters for each instant of observation; for the data in the example, they are: baseline evaluation performing active sequence, baseline evaluation per-forming the control sequence, final evaluation perper-forming the active sequence and final evaluation performing the control sequence. To compare the control and ac-tive sequences during the baseline session we should rewrite the model using only three parameters: baseline evaluation (the same for active and control sequences), final evaluation performing active sequence and final evaluation performing con-trol sequence.

The average of the absolute differences between the sample means of the num-ber of successful and total attempts and the respective expected values under this final model (Table 7) is 1.7. The same average based on the multivariate beta-binomial/Poisson model is 0.9. Furthermore, the average of the absolute dif-ferences between the observed and estimated covariances using the multivari-ate binomial/Poisson model is 21.5 while it is 19.1 if we use the beta-binomial/gamma-Poisson. These differences are attributable to the more flexible

(15)

covariance structure induced by the latter, that is, allowing for different covari-ances between the repeated number of trials.

The values of the AIC (=1888.0) and the BIC (=1919.1) for the beta-binomial/gamma-Poisson model compared to the corresponding values (AIC= 1935.6 and BIC= 1974.0) for the multivariate beta-binomial/Poisson also suggest a better fit of the former.

Although the results are quite similar, with the exception of the values for pa-tients in the advanced stage of the disease, the beta-binomial/gamma-Poisson one is preferable to the multivariate beta-binomial/Poisson, both because of the mod-elling flexibility and the computational advantages mentioned before.

As an extension for the beta-binomial/gamma-Poisson model, we could in-corporate a parameter to relate the probabilities of success to the total attempts, as in Zhu et al. (2003). Another possible extension would be to consider the case where attempts could be done correctly, satisfactorily or incorrectly; in this case, we could generalize the model by considering Dirichlet-multinomial/gamma-Poisson distribution models. These extensions are currently under investiga-tion.

Appendix: First and second derivatives for the gamma-Poisson model

∂L(β_λ, β_α, β_δ) ∂βλ = ZλL[L−1n− (Ip⊗ B−1)(1p⊗ a)], ∂L(β_λ, β_α, β_δ) ∂βα = ZαM[c − D−1log(b)] and ∂L(βλ, βα, βδ) ∂βδ = Zδ[De + D−1M log(b)− B−1Lsa], ∂2L(βλ, βα, βδ) ∂βλ∂βλ = ZλL[Ip⊗ (AB−1)]{L[Ip⊗ (DB−1)] − IMp}Zλ, ∂2L(β_λ, β_α, β_δ) ∂β_λ∂β_α = −Z λL[Ip⊗ (MB−1)](1p⊗ Zα), ∂2L(βλ, βα, βδ) ∂β_λ∂β_δ = −Z λL[Ip⊗ (DB−2)][Ip⊗ (Ns− MLs)](1p⊗ Zδ), ∂2L(β_λ, β_α, β_δ) ∂β_α∂β_α = Z αM[C − D−1log(B)− MF]Zα, ∂2L(β_λ, β_α, β_δ) ∂β_α∂β_δ = Z αMD[−J − D−1B−1Ls+ D−2log(B)]Zδ and

(16)

∂2L(βλ, βα, βδ)

∂βδ∂βδ

= ZδD{E − DQ + MD−1B−1Ls

− D−2M log(B)− B−2Ls[Ns− MLs]}Zδ

with L(β_λ, β_α, β_δ)presented in (2.24) and

a= (a1, . . . , ag, . . . , aM), ag= δ(zδg) p h=1 ngh + α(zαg), A= diag{ag}, B= diag{bg}, bg= δ(zδg) _p h=1 λ(zλgh) + 1, log(b)= (log(b1), . . . ,log(bg), . . . ,log(bM)),

log(B)= diag{log(bg)}, c= (c1, . . . , cg, . . . , cM), cg= _h=1pngh−1 u=0 1 α(zαg)+ uδ(zδg) , C= diag{cg}, e= (e1, . . . , eg, . . . , eM), eg= _hp₌₁ngh−1 u=0 u α(zαg)+ uδ(zδg) , E= diag{eg}, F= diag{fg}, fg= p_h₌₁ngh−1 u=0 1 [α(zαg)+ uδ(zδg)]2 , J= diag{jg}, jg = _h=1pngh−1 u=0 u [α(zαg)+ uδ(zδg)]2 , Q= diag{qg}, qg= p_h=1ngh−1 u=0 _u α(zαg)+ uδ(zδg) 2 , n= (n11, . . . , ngh, . . . , nMp), Ns= diag _p h=1 ngh , L= diag{λ(zλgh)},

(17)

Ls= diag _p h=1 λ(zλgh) , M= diag{α(zαg)}, D= diag{δ(zδg)}, Zλ= (zλ11, . . . , zλgh, . . . , zλMp), Zα= (zα1, . . . , zαg, . . . , zαM), Zδ= (zδ1, . . . , zδg, . . . , zδM). Acknowledgments

We are grateful to Conselho Nacional de Desenvolvimento Científico e Tec-nológico (CNPq) and Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP), Brazil, for partial financial support. We are also grateful to Maria Elisa Pimentel Piemonte for providing the data.

References

Brooks, S. P., Morgan, B. J. T., Ridout, M. S. and Pack, S. E. (1997). Finite mixture models for proportions. Biometrics 53, 1097–1115.

Comulada, W. S. and Weiss, R. E. (2007). On models for binomial data with random number of trials.

Biometrics 63, 610–617.MR2370821

Cox, D. R. (1983). Some remarks on overdispersion. Biometrika 70, 269–274.MR0742997 Gange, S. J., Munoz, A., Saez, M. and Alonso, J. (1996). Use of the beta-binomial distribution to

model the effect of policy changes on appropriateness of Hospital Stays. Applied Statistics 45, 371–382.

Griffiths, D. A. (1973). Maximum likelihood estimation for the beta-binomial distribution and an application to the household distribution of the total number of cases of a disease. Biometrics 29, 637–648.

Ho, L. L. and Singer, J. M. (1997). Regression models for bivariate counts. Brazilian Journal of

Probability and Statistics 11, 175–197.MR1619725

Ho, L. L. and Singer, J. M. (2001). Generalized least squares methods for bivariate Poisson regres-sion. Communications in Statistics 30, 263–278.MR1862701

Johnson, N. L. and Kotz, S. (1970). Distributions in Statistics: Continuous Univariate

Distribu-tions 2. Boston: Houghton Mifflin.MR0270476

Karlis, D. and Ntzoufras, I. (2003). Analysis of sports data by using bivariate Poisson models. The

Statistician 52, 381–393.MR2011183

Lora, M. I. and Singer, J. M. (2008). Beta-binomial/Poisson models for repeated bivariate counts.

Statistics in Medicine 27, 3366–3381.MR2523922

Mood, A. M., Graybill, F. A. and Boes, D. C. (1974). Introduction to the Theory of Statistics. Singa-pore: McGraw-Hill.

(18)

Nelder, J. A. and McCullagh, P. (1989). Generalized Linear Models. London: Chapman and Hall. MR0727836

Nelson, J. F. (1985). Multivariate Gamma-Poisson Models. Journal of the American Statistical

As-sociation 392, 828–834.

Williams, D. A. (1975). The analysis of binary responses from toxicological experiments invloving reproduction and teratogenecity. Biometrics 31, 949–952.

Zhu, J., Eickhoff, J. C. and Kaiser, M. S. (2003). Modeling the Dependence between number of trials and success probability in beta-binomial–Poisson mixture distributions. Biometrics 59, 955–961. MR2025119

Escola de Economia de São Paulo Fundação Getulio Vargas São Paulo

Brazil

E-mail:[email protected]

Departamento de Estatística Universidade de São Paulo Brazil