• Nenhum resultado encontrado

Modelling longitudinal binary data M Salom´ e Cabral 1 , M Helena Gon¸calves 2 ,

1 CEAUL, Departamento de Estat´ıstica e Investiga¸c˜ao Operacional, Faculdade de Ciˆencias, Universidade de Lisboa, Portugal

2 CEAUL and Departamento de Matem´atica, Faculdade de Ciˆencias e Tecnologia, Universidade do Algarve, Portugal

Longitudinal binary data studies are a powerful design and they have become increasingly popular in a wide range of applications across all disciplines. Two of the features in these studies are the presence of missing data, since it is diffi- cult to have complete records of all individuals, and the presence of correlation structure in the repeated measures of each response vector. The methodology implemented in the R package bild will be discussed and two real data sets will be analysed to illustrate how this methodology overcomes those features and how the analysis is carried on.

Keywords: Markov chain, odds-ratio, missing data, marginal models, random effects models

The analysis of longitudinal binary data poses two main difficulties. First, the repeated measures of each response vector are likely to be correlated and the autocorrelation struc- ture for the repeated data plays a significant role in the estimation of regression parameters. Second, although most longitudinal studies are designed to collect data on every subject in the sample at each time of follow-up, many studies have missing data, intermittently or dropout, since it is difficult to have complete records of all subjects for a wide variety of reasons.

Generalized linear models have been extended to handle longitudinal binary observations in a number of different ways. Two of them will be considered: marginal models and random effects models. The basic premise of marginal models is to make inference about population means. In contrast, the basic premise of random effects models is that there is a natural heterogeneity across individuals and is used when the goal is to make inferences about individuals. The interpretation of the regressions parameters is not the same in those models. The regression parameters in generalized linear mixed models have subject- specific, rather than population-average, interpretation. The choice between marginal and random effects models for longitudinal data can only be made on subject-matter grounds. When longitudinal binary data are incomplete there are important implications for their analysis and one of the main concerns is to distinguish different reasons of missingness. The nature of missing data mechanism has been classified by [5] as: missing completely at random (MCAR), missing at random (MAR) and non-missing at random (NMAR). Several methods have been proposed for analysing incomplete longitudinal binary responses. 12 April, 12:00 - 13:00, ESTGV Auditorium

XXVI Meeting of the Portuguese Association of Classification and Data Analysis, Viseu, 11-13 April 2019

In the R package bild [3, 4] is implemented the methodology proposed by [2]. In this methodology the inference is based on likelihood and a binary Markov chain model is used to accommodate serial dependence and odds-ratio to measure dependence between successive observations in the same individual. Both marginal and random effects models (intercept model) are considered. The adaptive Gaussian quadrature is used to approxi- mate the log-likelihood using numerical integration when the intercept model is considered. In both cases missing values are allowed in the response, provided they are MAR.

Two real data sets will be analysed to illustrate the use of the R package bild. The first is a subset of data from the Muscatine Coronary Risk Factor Study, a longitudinal study of coronary risk factors in school children from Muscatine (Iowa, USA) available in [4]. The binary response of interest is whether the child is obese (1) or not (0). Since one of the objectives of the study was to determine the effects of sex and age on risk of obesity a marginal model is appropriate. Many data records are incomplete, since not all children have participated in all the surveys, creating, a ”genuine” missing data problem. The second data set is from a longitudinal clinical trial of contracepting women, available in [1]. The outcome of interest is a binary response indicating whether a woman experienced amenorrhea (1) or not (0) during the four periods of observation. A random effects model will be used since the goal of the analysis is to determine subject-specific changes in the risk of amenorrhea over the course of the study, and the influence of two dosages of a contraceptive on changes in a woman’s risk amenorrhea. A feature of this clinical trial is that there was substantial dropout.

Acknowledgements This work has been partially funded by FCT-Funda¸c˜ao Nacional para a Ciˆencia e a Tecnologia, Portugal, through the project UID/MAT/00006/2019.

References

[1] G. Fitzmaurice and J. Laird, N.and Ware. Applied LongitudinalAnalysis. John Wiley & Sons, New York, 2004.

[2] M.H. Gon¸calves and A. Azzalini. Using Markov chains for marginal modelling of binary longitudinal data in an exact likelihood approach. Metron, LXVI:157–181, 2008.

[3] M.H. Gon¸calves, M.S. Cabral, and A. Azzalini. The R package bild for the analysis of binary longitudinal data. Journal of Statistical Software, 46:1–17, 2012.

[4] M.H. Gon¸calves, M.S. Cabral, and A. Azzalini. bild: A package for BInary Longitudi- nal Data. R foundation for statistical computing, version 1.1-5., URL-http://CRAN.R- project.org/package-bild, 2013.

[5] R.J.A. Little and D.B. Rubin. Statistical Analysis with Missing Data. John Wiley & Sons, New York, 1987.

XXVI Meeting of the Portuguese Association of Classification and Data Analysis, Viseu, 11-13 April 2019

The log-ratio approach to handle relative information

Documentos relacionados