The impact of adopting continuous integration on the delivery time of merged pull requests: an empirical study

Texto

(1)FEDERAL UNIVERSITY OF RIO GRANDE DO NORTE CENTER OF EXACT AND EARTH SCIENCES DEPARTMENT OF INFORMATICS AND APPLIED MATHEMATICS GRADUATE PROGRAM IN SYSTEMS AND COMPUTING ACADEMIC MASTER’S DEGREE IN SYSTEMS AND COMPUTING. The Impact of Adopting Continuous Integration on the Delivery Time of Merged Pull Requests: An Empirical Study. João Helis Junior de Azevedo Bernardo. Natal, Brazil July, 2017.

(2) João Helis Junior de Azevedo Bernardo. The Impact of Adopting Continuous Integration on the Delivery Time of Merged Pull Requests: An Empirical Study. A dissertation submitted to the Computer Science Graduation Program of the Center of Exact and Earth Sciences in conformity with the requirements for the Degree of Master in Systems and Computing.. PPgSC - Graduate Program in Systems and Computing DIMAp - Department of Informatics and Applied Mathematics UFRN - Federal University of Rio Grande do Norte. Advisor: Uirá Kulesza Co-Advisor: Daniel Alencar da Costa. Natal, Brazil July, 2017.

(3) Catalogação da Publicação na Fonte. UFRN / SISBI / Biblioteca Setorial Especializada do Centro de Ciências Exatas e da Terra – CCET. Bernardo, João Helis Junior de Azevedo. The impact of adopting continuous integration on the delivery time of merged pull requests: an empirical study / João Helis Junior de Azevedo Bernardo. – Natal, RN, 2017. 96 f.: il. Orientador: Prof. Dr. Uirá Kulesza. Coorientador: Prof. Dr. Daniel Alencar da Costa. Dissertação (mestrado) – Universidade Federal do Rio Grande do Norte. Centro de Ciências Exatas e da Terra. Departamento de Informática e Matemática Aplicada. Programa de Pós-Graduação em Sistemas e Computação. 1. Engenharia de software – Dissertação. 2. Integração contínua – Dissertação. 3. Desenvolvimento baseado em pull requests – Dissertação. 4. Pull request – Dissertação. 5. Tempo de entrega – Dissertação. 6. Atraso de entrega – Dissertação. 7. Mineração de repositórios de software – Dissertação. I. Kulesza, Uirá. II. Costa, Daniel Alencar da. III. Título. RN/UF/BSE-CCET. CDU 004.41.

(4)

(5) Acknowledgements First and foremost, I would like to thank God, the Almighty, for giving me the strength and support in all this quest for knowledge, especially for showing me the way forward in the most difficult moments of my life. Without His blessings, I certainly would not have got here. My deep gratitude to my parents, João Helis Bernardo and Rosilda de Azevedo Bernardo, and to my sister Juliana Raffaely de Azevedo Bernardo, without their love, dedication and support in all single part of my life, I would not be who I am. Thanks for teaching me that I can never give up on my dreams. I would like to express my deepest gratitude and special thanks to my girlfriend Milenna Veríssimo, for her love, support and constant patience. Thanks for always encourage me to be a better man. I love you. I would like to express my extreme sincere gratitude to my advisor Uirá Kulesza, who gave me the opportunity to work with him, and expertly guided me on the path that I walked during my master’s degree. I would also like to thank my co-advisor and friend Daniel Alencar da Costa, for mentoring me and provide me all support that I needed to conduct the studies that we performed in this dissertation. Without his precious guidance, I could not be able to achieve the state of this work. I would like to extend my appreciation to my laboratory colleagues, Leo Moreira, Fabio Penha, and Eduardo Nascimento who helped to make lighter the pressures that we were facing together on the final stages of our master’s degree, by providing moments of sharing knowledge and fun through the so-called "coffee time". Ultimately, I am very grateful to CNPq for the financial support..

(6) Society must learn that we Indians can and should use technology and information in our everyday activities. That doesn’t make us any less Indians. Being Indian is in the blood that flows through our veins, not in clothing and utensils that we use or any external characteristic..

(7) Abstract Continuous Integration (CI) is a software development practice that leads developers to integrate their work more frequently. Software projects have broadly adopted CI to ship new releases more frequently and to improve code integration. The adoption of CI is usually motivated by the allure of delivering new software content more quickly and frequently. However, there is little empirical evidence to support such claims. Over the last years, many available software projects from social coding environments such as GitHub have adopted the CI practice using CI facilities that are integrated in these environments (e.g., Travis-CI). In this dissertation, we empirically investigate the impact of adopting CI on the time-to-delivery of pull requests (PRs), through the analysis of 167,037 PRs of 90 GitHub projects that are implemented in 5 different programming languages. On analyzing the percentage of merged PRs per project that missed at least one release prior being delivered to the end users, the results show that before adopting CI, a median of 13.8% of merged PRs are postponed by at least one release, while after adopting CI, a median of 24% of merged PRs have their delivery postponed to future releases. Contrary to what one might speculate, we find that PRs tend to wait longer to be delivered after the adoption of CI in the majority (53%) of the studied projects. The large increase of PR submissions after CI is a key reason as to why these projects deliver PRs more slowly after adopting CI. 77.8% of the projects increase the rate of PR submissions after adopting CI. To investigate the factors that are related to the time-to-delivery of merged PRs, we train linear and logistic regression models, which obtain sound median R-squares of 0.72-0.74, and good median AUC values of 0.85-0.90. A deeper analysis of our models suggests that, before and after the adoption of CI, the intensity of code contributions to a release may increase the delivery time due to a higher integration-load (in terms of integrated commits) of the development team. Finally, we are able to accurately identify merged pull requests that have a prolonged delivery time. Our regression models obtained median AUC values of 0.92 to 0.97.. Keywords: Continuous Integration; Pull-based Development; Pull Request; Delivery Time; Delivery Delay; Mining Software Repositories..

(8) Resumo A Integração Contínua (IC) é uma prática de desenvolvimento de software que leva os desenvolvedores a integrarem seu código-fonte mais frequentemente. Projetos de software têm adotado amplamente a IC com o intuito de melhorar a integração de código e lançar novas releases mais rapidamente para os seus usuários. A adoção da IC é usualmente motivada pela atração de entregar novas funcionalidades do software de forma mais rápida e frequente. Todavia, há poucas evidências empíricas para justificar tais alegações. Ao longo dos últimos anos, muitos projetos de software disponíveis em ambientes de codificação social, como o GitHub, tem adotado a prática da IC usando serviços que podem ser facilmente integrados nesses ambientes (por exemplo, Travis-CI ). Esta dissertação investiga empiricamente o impacto da adoção da IC no tempo de entrega de pull requests (PRs), através da análise de 167.037 PRs de 90 projetos do GitHub que são implementados em 5 linguagens de programação diferentes. Ao analisar a porcentagem de merged PRs por projeto que perderam pelo menos uma release antes de serem entregues aos usuários finais, os resultados mostraram que antes da adoção da IC, em mediana 13.8% dos merged PRs tem sua entrega adiada por pelo menos um release, enquanto que após a adoção da IC, em mediana 24% dos merged PRs tem sua entrega adiada para futuras releases. Ao contrário do que se pode especular, observou-se que PRs tendem a esperar mais tempo para serem entregues após a adoção da IC na maioria (53%) dos projetos investigados. O grande aumento das submissões de PRs após a IC é uma razão fundamental para que projetos demorem mais tempo para entregar PRs depois da adoção da IC. 77,8% dos projetos aumentam a taxa de submissões de PRs após a adoção da IC. Com o propósito de investigar os fatores relacionados ao tempo de entrega de merged PRs, treinou-se modelos de regressão linear e logística, os quais obtiveram R-Quadrado mediano de 0.72-0.74 e bons valores medianos de AUC de 0.85-0.90. Análises mais profundas de nossos modelos sugerem que, antes e depois da adoção da IC, a intensidade das contribuições de código para uma release pode aumentar o tempo de entrega de PRs devido a uma maior carga de integração (em termos de commits integrados) da equipe de desenvolvimento. Finalmente, apresentamos heurísticas capazes de identificar com precisão os PRs que possuem um tempo de entrega prolongado. Nossos modelos de regressão obtiveram valores de AUC mediano de 0.92 a 0.97.. Palavras-chave: Integração Contínua; Desenvolvimento Baseado em Pull Requests; Pull Request; Tempo de Entrega; Atraso de Entrega; Mineração de Repositórios de Software..

(9) List of Figures Figure 1 – An overview of the scope of the dissertation. . . . . . . . . . . . . . . Figure 2 – An overview of the pull-based development model that is integrated with Continuous Integration. . . . . . . . . . . . . . . . . . . . . . . Figure 3 – An illustrative example of how we compute delivery time in terms of days. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 4 – An illustrative example of how we compute delivery time in terms of releases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 5 – The basic life-cycle of a released pull request. . . . . . . . . . . . . . Figure 6 – Training Linear and Logistic Regression Models. . . . . . . . . . . . Figure 7 – Percentage of merged pull requests that have a long delivery time. . Figure 8 – An overview of our project selection process. . . . . . . . . . . . . . Figure 9 – Number of projects grouped by programming language. . . . . . . . Figure 10 – An overview of our data collection process. . . . . . . . . . . . . . . Figure 11 – Distribution of pull requests per bucket before and after continuous integration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 12 – Number of days between the studied releases of the projects, before and after continuous integration. . . . . . . . . . . . . . . . . . . . . Figure 13 – Merge timing metric. We present the distribution of the merge timing metric for merged pull requests that are prevented from integration in at least one release. . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 14 – The required number of days to merge and deliver pull requests (pull request lifetime). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 15 – Pull request submission, merge, and delivery rates per release. . . . Figure 16 – Number of pull request submissions (per release) before and after the adoption of continuous integration. . . . . . . . . . . . . . . . . Figure 17 – Distribution of the Brier Score and the Brier optimism of the models before and after CI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 18 – Distribution of the AUC and the AUC optimism of the models before and after CI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 19 – Distributions of models’ R2 and R2 optimism. . . . . . . . . . . . . . Figure 20 – Explanatory power of variables before adopting continuous integration (Delivery Time in terms of releases). . . . . . . . . . . . . . . . . Figure 21 – Explanatory power of variables after adopting continuous integration (Delivery Time in terms of releases). . . . . . . . . . . . . . . . . Figure 22 – The relationship between the most influential variables and delivery time in terms of releases. . . . . . . . . . . . . . . . . . . . . . . . . .. 17 22 25 26 29 33 38 41 43 44 50 51. 52 53 55 56 58 59 59 62 63 64.

(10) Figure 23 – The number of models per most influential variables (Delivery Time in terms of releases). . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 24 – Explanatory power of variables before adopting continuous integration (Delivery Time in terms of days). . . . . . . . . . . . . . . . . . . Figure 25 – Explanatory power of variables after adopting continuous integration (Delivery Time in terms of days). . . . . . . . . . . . . . . . . . . Figure 26 – The number of models per most influential variables (Delivery Time in terms of days). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 27 – The relationship between the most influential variables and delivery time in terms of days. . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 28 – Distribution of the Brier Score and the Brier optimism of the models before and after CI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 29 – Distribution of the AUC and the AUC optimism of the models before and after CI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 30 – Explanatory power of variables before adopting continuous integration (Prolonged delivery time analysis). . . . . . . . . . . . . . . . . . Figure 31 – Explanatory power of variables after adopting continuous integration (Prolonged delivery time analysis). . . . . . . . . . . . . . . . . . Figure 32 – The number of models per most influential variables (Prolonged delivery time analysis). . . . . . . . . . . . . . . . . . . . . . . . . . .. 65 66 66 68 69 70 72 74 74 75.

(11) List of Tables Table 1 – Long delivery time thresholds (PART I) . . . . . . . . . . . . . . . . . Table 2 – Long delivery time thresholds (PART II) . . . . . . . . . . . . . . . . . Table 3 – Summary of the number of projects and released pull requests grouped by programming language. . . . . . . . . . . . . . . . . . . . . . . . . Table 4 – Metrics that are used in our explanatory models (Contributor, Pull Request and Project families). . . . . . . . . . . . . . . . . . . . . . . . Table 5 – Metrics that are used in our explanatory models (Process family). . . Table 6 – Brier Score and AUC values for the models that we fitted using pull requests data of before continuous integration. . . . . . . . . . . . . . Table 7 – Brier Score and AUC values for the models that we fitted using pull requests data of after continuous integration. . . . . . . . . . . . . . Table 8 – R2 and R2 optimism values for the linear models that we fitted using pull requests data of before continuous integration. . . . . . . . . . . Table 9 – R2 and R2 optimism values for the linear models that we fitted using pull requests data of after continuous integration. . . . . . . . . . . . Table 10 – Descriptive metrics for the percentage of the explanatory power of each variable of our models, before and after the adoption of continuous integration (Delivery Time in terms of releases). . . . . . . . . . Table 11 – Descriptive metrics for the percentage of the explanatory power of each variable of our models, before and after the adoption of continuous integration (Delivery Time in terms of days). . . . . . . . . . . . Table 12 – Brier Score and AUC values for the models that we fitted using pull requests data of before the adoption of continuous integration. . . . Table 13 – Brier Score and AUC values for the models that we fitted using pull requests data of after the adoption of continuous integration. . . . . Table 14 – Descriptive metrics for the percentage of the explanatory power of each variable of our models, before and after the adoption of continuous integration (Prolonged delivery time analysis). . . . . . . . . . .. 39 40 43 47 48 57 57 60 61. 62. 67 70 71. 73.

(12) List of abbreviations and acronyms OSS. Open Source Software. OSD. Open Source Definition. CI. Continuous Integration. ITS. Issue Tracker System. PR. Pull Request. XP. Extreme Programming. ARE. Agile Release Engineering. DVCS. Distributed Version Control Systems. DF. Degrees of Freedom.

(13) Contents 1 1.1 1.2 1.3 1.3.1 1.4 1.5. INTRODUCTION . . . . . . . . Problem Statement . . . . . . . Current Research Limitations . Dissertation Proposal . . . . . Chronology of Analyses . . . . Dissertation Contributions . . Dissertation Organization . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 14 15 16 17 18 19 20. 2 2.1 2.2 2.3 2.4. BACKGROUND & DEFINITIONS . . The pull-based development model Continuous Integration . . . . . . . Delivery Time . . . . . . . . . . . . . Chapter Summary . . . . . . . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 21 21 23 24 26. 3 3.1 3.1.1. 3.2 3.3 3.4. EMPIRICAL STUDY . . . . . . . . . . . . . . . . . . . . . . . . . . . Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . RQ1: How often are merged pull requests prevented from being released? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RQ2 - Are pull requests released more quickly using continuous integration? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RQ3 - Does the increased development activity after adopting continuous integration increase the delivery time of pull requests? RQ4: How well can we model the delivery time of merged pull requests? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RQ5: What are the most influential attributes for modeling delivery time? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RQ6: How well can we identify the merged pull requests that will suffer from a long delivery time? . . . . . . . . . . . . . . . . . . . . RQ7: What are the most influential attributes for identifying the merged pull requests that will suffer from a long delivery time? . Studied Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4. STUDY RESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3.1.2 3.1.3 3.1.4 3.1.5 3.1.6 3.1.7. . . . . . . .. . . . . . . .. 27 27 27 28 30 31 35 37 41 41 44 46 49.

(14) 4.1. 4.3. Analysis I — What is the impact of continuous integration on the delivery time of pull requests? . . . . . . . . . . . . . . . . . . . . . Analysis II — What is the impact of continuous integration on the prolonged delivery time? . . . . . . . . . . . . . . . . . . . . . . Threats to the Validity . . . . . . . . . . . . . . . . . . . . . . . . . .. 69 75. 5 5.1 5.2 5.3. CONCLUSION . . . . . . . . Dissertation Contributions Related Work . . . . . . . . . Future Work . . . . . . . . .. 78 78 80 82. 4.2. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 49. BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 84. APPENDIX. 90. APPENDIX A – STUDIED PROJECTS . . . . . . . . . . . . . . . .. 91. APPENDIX B – R2 AND R2 OPTIMISM FOR THE LINEAR MODELS 93 APPENDIX C – PERCENTAGE OF DELIVERED PULL REQUESTS PER PROJECT IN THE NEXT AND LATER RELEASE BUCKETS . . . . . . . . . . . . . . . . . . . . . . 95.

(15) 14. 1 Introduction The increasingly user demands for new functionalities and performance improvements rapidly changes customer requirements and turn software development into a competitive market (WNUK; GORSCHEK; ZAHDA, 2013). In this scenario, current software development teams need to deliver new functionalities more quickly to their customers to improve the time-to-market (DEBBICHE; DIENÉR; SVENSSON, 2014; LAUKKANEN; PAASIVAARA; ARVONEN, 2015). This faster delivery may lead customers to become engaged in the project and to give valuable feedback. The failure of providing new functionalities and bug-fixes, on the other hand, may reduce the number of users and the project’s success. Over the last years, the agile methodologies, such as Scrum (SCHWABER, 1997) and Extreme Programming (XP) (BECK, 2000), brought a series of practices with the allure of providing a more flexible software development and a faster delivery of new software releases. The frequency of releases is one of the factors that may lead a software project to success (CHEN; REILLY; LYNN, 2005; WOHLIN; XIE; AHLGREN, 1995). The releasing frequency may also indicate the vitality level of a software project (CROWSTON; ANNABI; HOWISON, 2003). In order to improve the process of shipping new releases, i.e., in terms of software integration and packaging, Continuous Integration (CI) appears as an important practice that may quicken the delivery of new functionalities (LAUKKANEN; PAASIVAARA; ARVONEN, 2015). In addition, continuous integration may reduce problems of code integration in a collaborative environment (VASILESCU et al., 2014). The continuous integration practice has been widely adopted by the software community in open source and industrial settings (DUVALL; MATYAS; GLOVER, 2007). It is especially important for open source projects given their lack of requirement documents and geographically distributed teams (VASILESCU et al., 2014). 70% of the most popular GitHub projects use continuous integration and the percentage of projects that use continuous integration is growing (HILTON et al., 2016). GitHub is considered the most popular version hosting worldwide (GOUSIOS; SPINELLIS, 2012), with more than 14 million of registered users, a wide variety of.

(16) Chapter 1. Introduction. 15. projects of different programming languages, sizes and characteristics. Any user can send contributions to any public repository that is hosted on GitHub by sending a pull request (VASILESCU et al., 2015). A pull request is a change proposal that is to be applied in the project code-base. Pull requests may fix bugs, provide enhancements or new functionalities. In some cases, a pull request is linked to a change request (a.k.a, an issue report) that is registered in the Issue Tracking System (ITS). Pull requests are reviewed by core developers or project integrators and are accepted when the changes are useful and meet the project pre-set quality standards. The basic life-cycle of a pull request is comprised of four steps. First, a pull request is submitted to a software project by a contributor. Once submitted, the continuous integration service automatically builds the whole project and runs the test suite to verify whether the pull request breaks the codebase. In case that all tests pass during the continuous integration process, the integrators thoroughly review the pull request and decide to merge or reject the pull request. A merged pull request means that such a pull request is integrated into the project codebase, i.e., a solution is provided, tested, and it is ready to be delivered to the end users through an official software release. Finally, the merged pull request is delivered to the end users through a software release.. 1.1. Problem Statement. Once a pull request is merged (i.e., ready to be delivered to the end users of a software system through an official software release), such a pull request may still have delay before being released. In this dissertation, we use the term delivery time to refer to the delay that merged pull requests suffer prior to their delivery to end users. This delay can be frustrating to end users because these users care most about when a new functionality is delivered, so they can benefit from it (COSTA et al., 2016). Furthermore, a higher delay to deliver pull requests may lead software projects to lose their users, given the increasingly competition between software organizations (BASKERVILLE; PRIESHEJE, 2004). This competition has forced organizations to release new functionalities at a faster pace, i.e., projects such as Firefox and Unity3D shifted from a traditional release cycle (12–18 months) to a rapid release cycle (1–3 months) to meet the pressure of the market (SOUZA; CHAVEZ; BITTENCOURT, 2014). A long delivery time can also frustrate contributors of open source projects, once one of their motivations to contribute is to see their proposed contributions available to the end users in a timely manner (JIANG; ADAMS; GERMAN, 2013). An important reason as to why developers contribute to an open source project is that such developers (socalled contributors) are always users of the produced contributions, hence they do not want to wait for long to benefit from those contributions. Furthermore, researchers on.

(17) Chapter 1. Introduction. 16. social-psychological feedback effect reveal that people get more involved in one task if they get feedbacks, also, the feedback loop of attention is important to motivating contributors to persist (LIU; LI; HE, 2016). Contributors who stop receiving attention (e.g., often have its pull requests merged and released late) tend to stop contributing (WU; WILKINSON; HUBERMAN, 2009). On the other hand, attracting and retain the interest of talented developers is crucial to open source projects achieve sustained success (LONG, 2006). In this matter, the present dissertation has the goal of reducing the lack of empirical understanding of the impact of adopting continuous integration on the delivery time of merged pull requests. A deep understanding of such delays can help software projects to diminish such undesired delays. Also, this understanding may help project managers to be aware of which factors most impact the delay to deliver merged pull requests to end users, and hence, they may handle it properly.. 1.2. Current Research Limitations. Prior work have analyzed the usage of continuous integration in open source projects that are hosted in GitHub (HILTON et al., 2016; BELLER; GOUSIOS; ZAIDMAN, 2016; YU et al., 2016; VASILESCU et al., 2014; VASILESCU et al., 2015). For instance, Vasilescu et al. (VASILESCU et al., 2015) investigated the productivity and quality outcomes of projects that use continuous integration in GitHub. They found that projects that use continuous integration merge pull requests more quickly when they are submitted by core developers. Also, core developers discover a significantly larger amount of bugs when they use continuous integration. YU et al. (YU et al., 2016) show that the more succinct a pull request is, the greater the probability that such a pull request is reviewed and merged earlier. Finally, Ståhl and Bosch (STÅHL; BOSCH, 2014b) stated that continuous integration may also improve the release frequency, which hints that software functionalities may be delivered more quickly for users. Recent research work has studied the delivery time of new features, enhancements, and bug fixes (COSTA et al., 2014; COSTA et al., 2016; CHOETKIERTIKUL et al., 2015; CHOETKIERTIKUL et al., 2017). For instance, Costa et al. (COSTA et al., 2014) mined data from the VCSs and ITSs of the Firefox, ArgoUML and Eclipse projects to investigate how frequent is delivery time of fixed issues in such projects. In a follow up research, Costa et al. (COSTA et al., 2016) investigated the impact of switching from traditional releases to rapid releases on the delivery time of fixed issues of the Firefox project. They used predictive models to discover which factors significantly impact the delivery time of fixed issues in each release strategy. However, to the best of our knowledge, no prior work has investigated the impact of adopting continuous integra-.

(18) Chapter 1. Introduction. 17. tion on the delivery time of merged pull requests. Hence, understanding the impact of the adoption of continuous integration on the different delivery time dimensions (see Definitions 1 and 2) that were already proposed in the literature remain as an open challenge.. 1.3. Dissertation Proposal. The general research question that is investigated in this dissertation is what is the impact of the adoption of continuous integration on the delivery time of merged pull requests? This dissertation proposes empirically analyze the impact of the adoption of continuous integration on the delivery time of merged pull request in two perspectives. First, we investigate the impact of adopting continuous integration on the delivery time of pull requests in terms of days and releases (Definitions 1 and 2). Finally, we analyze the impact of the adoption continuous integration on the prolonged delivery time (Definition 3). Figure 1 shows an overview of the scope of the analyses that we perform in this dissertation.. Figure 1 – An overview of the scope of the dissertation. Based on our general research question, seven research questions were proposed in order to guide this work. RQ1— RQ5 perform analyses that investigate the impact of adopting continuous integration on the delivery time of pull requests, while RQ6 and RQ7 analyze the impact of continuous integration on the prolonged delivery time. To address the research questions of this study, we analyzed data of 90 GitHub projects that are implemented in 5 different programming languages (See Appendix A). We investigate a total of 167,037 pull requests with 40,321 pull requests before and 126,716 pull requests after the adoption of continuous integration. For each group of analysis, we present their respective research questions in the following. Furthermore,.

(19) Chapter 1. Introduction. 18. for each RQ we provide a detailed description of its motivation and research approach in Section 3.1. Analysis I — What is the impact of continuous integration on the delivery time of pull requests? RQ1 How often are merged pull requests prevented from being released? RQ2 Are pull requests released more quickly using continuous integration? RQ3 Does the increased development activity after adopting continuous integration increase the delivery time of merged pull requests? RQ4 How well can we model the delivery time of merged pull requests? RQ5 What are the most influential attributes for modeling delivery time? Analysis II — What is the impact of continuous integration on the prolonged delivery time? RQ6 How well can we identify the merged pull requests that will suffer from a long delivery time? RQ7 What are the most influential attributes for identifying the merged pull requests that will suffer from a long delivery time?. 1.3.1. Chronology of Analyses. The arrow in Figure 1 shows which analysis inspired the other. We based our first analysis on the study performed by Costa et al. (COSTA et al., 2017), which shows that despite issues being addressed well before an upcoming release, 34% to 98% of such addressed issues are delayed by at least one release in the ArgoUML, Eclipse and Firefox projects. Based on their results, in our first analysis we intend to study the impact of adopting continuous integration on the delivery time of merged pull requests of open source projects. We find that despite most pull requests being merged well before the release date, 13.8% (median) of them miss at least one release before continuous integration, while 24% miss at least one release after continuous integration (see RQ1). Also, we observe that the time from submission to release of a pull request (i.e., pull request lifetime) is shorter before the adoption of continuous integration in 53% of the studied projects (see RQ2). After conducting Analysis I, we perform an exploratory analysis in our data and we observe that in median 24% of the pull requests of the investigated projects have a prolonged delivery time. This results motivate the Analysis.

(20) Chapter 1. Introduction. 19. II of this dissertation, which intend to investigate the impact of adopting continuous integration on the prolonged delivery time. Such an investigation help us to better understand which factors are most influential to predict pull requests that are going to have a prolonged delivery time, hence, it may help contributors and project managers to avoid such undesired delays.. 1.4. Dissertation Contributions. The main contribution of this dissertation is to provide an empirical understanding of the impact of the adoption of continuous integration on the time-to-delivery of merged pull requests. Through an analysis of 90 GitHub projects and 167,037 pull requests, we outline the contributions of this dissertation below. We grouped the contributions by their respective dimension of analysis. Analysis I — What is the impact of continuous integration on the delivery time of pull requests? • On analyzing the percentage of merged PRs per project that missed at least one release prior being delivered to the end users, the results show that before adopting CI, a median of 13.8% of merged PRs are postponed by at least one release, while after adopting CI, a median of 24% of merged PRs have their delivery postponed to future releases. Furthermore, we find that many pull requests that miss at least one release were merged well before the release date of the missed releases (RQ1). • We find that the time from submission to release of a pull request (i.e., pull request lifetime) is shorter before the adoption of continuous integration in most of the studied projects (53%) (RQ2). • In the majority of the studied projects (68.9%), the merge time of pull requests is increased after adopting continuous integration (RQ2). • It is not clear whether the adoption of continuous integration increase/decrease the delivery time of merged pull requests (RQ2). • We find that the large increase in the number of pull requests submissions after adopting continuous integration is a key reason as to why projects deliver pull requests more slowly after adopting continuous integration. 77.8% of the projects increase the rate of pull request submissions after adopting continuous integration (RQ3)..

(21) Chapter 1. Introduction. 20. • We are able to create heuristics that obtain sound results on estimating the delivery time of merged pull requests in terms of number of days and releases, both before and after continuous integration. Ou explanatory models achieve sound median R2 values of of.72 to 0.74 (RQ4). • The number of commits performed to produce a release is the most influential factor to estimate delivery time of merged pull requests in terms of days and in terms of releases, both before and after continuous integration (RQ5). • The time at which a pull request is merged (i.e., queue rank) and the amount of pull requests competing for being merged (i.e., merge workload) also have a strong impact on estimating the delivery time in terms of days and releases, both before and after continuous integration (RQ5). Analysis II — What is the impact of continuous integration on the prolonged delivery time? • In median, 24% of the merged pull requests of the investigated projects have a prolonged delivery time (RQ6). • Our models that identify merged pull requests that have a prolonged delivery time obtain excellent median AUC values of 0.92 to 0.97 (RQ6). • Prolonged delivery time is more closely associated with the required number of commits to produce a release, and with project characteristics, such as the queue rank and merge workload. Moreover, the contributor experience and contributor delivery variables also play an influential role on identifying a prolong delivery time, both before and after continuous integration (RQ7).. 1.5. Dissertation Organization. The remainder of this dissertation is organized as follows. In Chapter 2, we present the necessary background and definitions to the reader. In Chapter 3, we explain the design of our empirical study. In Section 3.1, we present each RQ and its respective motivation and research approach, while we present the project selection and data collection processes in Sections 3.2 and 3.3, respectively. In Chapter 4, we present the results of this study and their treats to the validity. Finally, we draw conclusions in Chapter 5..

(22) 21. 2 Background & Definitions In this chapter, we outline the key concepts and definitions that are necessary to understand the analyses that are performed in this dissertation.. 2.1. The pull-based development model. The Distributed Version Control Systems (DVCS), e.g., Git, have revolutionized the way people develop software. The purpose of distributed development is to enable contributors around the world to contribute to a software project that are managed by a core team (GOUSIOS; PINZGER; DEURSEN, 2014). There are two general ways that potential contributors can submit their contributions to a software project in a distributed code-hosting environment (e.g., GitHub): (i) shared repository, and (ii) pull-based development. We explain each one of these approaches in the following.. (i) Shared repository The core team shares the read and write accesses to the central repository, enabling external contributors to clone the repository, work locally and push their code contributions back to the central repository.. (ii) Pull-based development Pull-based development is a paradigm broadly used by contributors of open source projects to develop software in a distributed and collaborative way (VASILESCU et al., 2015). By definition, open source software is a software for which interested users have access to its source code (MADEY; FREEH; TYNAN, 2002). Generally, open source can be seen as a computer software that is freely available in source code form and that allow users to freely use, study and change its source code, providing improvements on the software as per his/her requirements (TIWARI, 2010). Open source projects typically use code hosting providers (i.e., GitHub) to manage their code contributions..

(23) Chapter 2. Background & Definitions. 22. The most popular code hosting providers, e.g., GitHub and Bitbucket provide support to the pull-based development model. On GitHub, almost half of all collaborative projects use pull requests in their development process (GOUSIOS et al., 2015). GitHub and Bitbucket allow any user to fork and clone any public repository and send pull requests (GOUSIOS; PINZGER; DEURSEN, 2014). A pull request is a mechanism enabled by Git that allows contributors to work locally on the forked repository and ask to have their contributions merged into the main repository. The writing access to a repository is not mandatory to submit pull requests (VASILESCU et al., 2015). The pull-based development process is explained in Figure 2 that shows an overview of the process to send contributions to a repository using pull requests. We explain each step of the process below:. Figure 2 – An overview of the pull-based development model that is integrated with Continuous Integration.The Step 4 is only performed when continuous integration is used. • Step 1. Fork a repository: The main repository of a project is not shared to external contributors. Instead, contributors can clone the main repository by forking it, so they can modify the code without interfering in other repositories and with no need of being a team member. • Step 2. Work locally the forked repository: The contributors develop new functionalities, fix bugs or provide features and enhancements to the forked repository..

(24) Chapter 2. Background & Definitions. 23. • Step 3. Submit the local changes to the main repository. When changes are ready to be submitted, contributors request a pull of such changes to the main repository by sending a pull request (YU et al., 2016). Such pull request specifies the local branch that has to be merged into a given branch of the main repository. • Step 4. Verify whether the pull request breaks the build. The continuous integration service automatically merge the pull request into a test branch. Next, the continuous integration service builds the whole project and runs the test suite to verify whether the pull request breaks the codebase. Typically, if tests fail during the process of continuous integration, the pull request is rejected and additional changes are required to the external contributor to improve his/her pull request (YU et al., 2016). In case that all tests pass during the CI process, the integrators thoroughly review the pull request before deciding to accept the contributions. This decision is based on the quality, technical design, and the priorities of the submitted pull requests (GOUSIOS et al., 2015). • Step 5. Accept or reject a pull request: After the pull request submission, an integrator of the main repository must inspect the changes to decide whether they are satisfactory. In case that the changes fulfill the requirements of the project, the integrator pulls them to the specified branch of the main repository. Otherwise, the core team may request additional changes to the external contributor to make his/her pull request acceptable. In the pull-based development, the integrator plays a crucial role by managing contributions (GOUSIOS et al., 2015). Projects that use a shared repository strategy can also use pull requests in a complementary way, so that the core team members push their contributions directly, while external contributors submit their contributions via pull requests. Therefore, projects can also use pull requests for conduct code reviews and to discuss new features. In many projects, all contributions are submitted via pull requests, even when the contributions are sent by core developers. By using this approach, the projects ensure that only reviewed code gets merged (GOUSIOS et al., 2015).. 2.2. Continuous Integration. Continuous Integration is a set of practices that lead developers to integrate their work more frequently, i.e., at least daily (FOWLER; FOEMMEL, 2006; MEYER, 2014). Basically, the main goal of continuous integration is to integrate early, so that the developers do not have to keep their code changes localized in their workspace for long. Instead, an automatic system must verify if the changes do not broke the.

(25) Chapter 2. Background & Definitions. 24. codebase of the software project, then these changes must be shared with the development team quickly (VIRMANI, 2015). In this context, continuous integration aims to avoid the unpredictability of the code and a large integration effort (LAUKKANEN; PAASIVAARA; ARVONEN, 2015), by identifying software errors and defects quickly, so that the developers can correct such errors sooner (LAI; LEU, 2015). In continuous integration, all code must be maintained in a single repository. When a contributor commits to the repository, an automated system verifies whether the change breaks the codebase (Step 4 of Figure 2) (MEYER, 2014). The entire process must be automated. Ideally, a build should compile the code and include a test suite to verify whether the codebase is broken after adding new changes. In continuous integration, the work of developers is continually compiled, built, and tested (YU et al., 2016). Continuous integration was originally proposed as one of the twelve Extreme Programming (XP) practices, but it is often used outside the context of XP (BELLER; GOUSIOS; ZAIDMAN, 2016). Continuous integration is widely used on GitHub. According to Gousios et al. (GOUSIOS et al., 2015), 75% of GitHub projects that makes a heavy use of pull requests also tend to use continuous integration. Several CI services, such as Jenkins, TeamCity, Bamboo, CloudBees and Travis-CI (MEYER, 2014) are available for development teams. Jenkins and Travis-CI are the most used by GitHub projects (VASILESCU et al., 2015). Travis-CI is a CI platform for open source and private GitHub projects. Currently, over 300k projects are using this tool.1 The wide adoption of continuous integration is related to the perceived benefits that are brought by this practice. According to Fowler (FOWLER; FOEMMEL, 2006), the greatest benefit of continuous integration is reduce risk. The study of Duvall at al. (DUVALL; MATYAS; GLOVER, 2007) also stated that the adoption of continuous integration contribute to a higher confidence of the development team regarding their software product. Furthermore, continuous integration is often adopted by software projects with the allure of delivering new features more quickly (LAUKKANEN; PAASIVAARA; ARVONEN, 2015) and to increase the release frequency and predictability (STÅHL; BOSCH, 2014b).. 2.3. Delivery Time. Delivery time refers to the time between the moment at which a pull request is merged to the time at which such a pull request is delivered to end users of a software system through an official software release. In this dissertation, we investigate two 1. <https://travis-ci.org>.

(26) Chapter 2. Background & Definitions. 25. dimensions of delivery time: (i) delivery time in terms of number of days; and (ii) delivery time in terms of number of releases. Additionally, we investigate characteristics of pull requests that have a (iii) prolonged delivery time.. Definition 1 — Delivery time in terms of days Figure 3 shows the basic life-cycle of a released pull request, and provide an example of how we measure delivery time in terms of days. To compute delivery time in terms of days, we count the number of days between the moment at which a pull request was merged and the moment at which such a pull request was released (t2).. Figure 3 – An illustrative example of how we compute delivery time in terms of days.. Definition 2 — Delivery time in terms of releases Figure 4 provides an example of how we measure delivery time of merged pull requests in terms of releases. To compute the delivery time in terms of releases, we count the number of releases that a given merged pull request is prevented from delivery. For instance, in Figure 4, PR #05 is submitted at time t1, merged at t2, and shipped at time t3. The delivery time in terms of releases for the PR #05 is the number of official releases that are shipped between t2 and t3. In the given example, PR #05 was prevented from delivery in the release v1.1, and it was delivered in the release v2.0, hence PR #05 has a delivery time of one release.. Definition 3 — Prolonged delivery time We follow an approach similar to the one used by Costa et al. (COSTA et al., 2017) to identify pull requests that suffer from a prolonged delivery time. Let T = {t1 , t2 , ..., tn } be the set of delivery times for the pull requests p1 , p2 , ..., pn of a given project, we consider that pi has a long delivery time ti if ti > MAD ( T ) + median( T ). The MAD refers to the Median Absolute Deviation of the distribution of delivery time of the pull requests of a given project. The greater the MAD, the higher the variation of a distribution with respect to its median (HOWELL, 2014; EFRON, 1986). The MAD is.

(27) Chapter 2. Background & Definitions. 26. Figure 4 – An illustrative example of how we compute delivery time in terms of releases. commonly used as an alternative approach to detect outliers. Instead of use standard deviation around the mean, we use absolute deviation around the median.. 2.4. Chapter Summary. In this chapter, we provide the key concepts and terms that we use in this dissertation to the reader. We first describe the pull based development model and how developers contribute to a software project by sending pull requests (Section 2.1). Next, we outline the key concepts of continuous integration, which is a set of practices that lead developers to integrate their work at least daily. Furthermore, we described how continuous integration works with pull-based development (Section 2.2). Finally, we define the two different types of delivery time that we study in this dissertation (Section 2.3)..

(28) 27. 3 Empirical Study In this chapter, we outline the motivation and research approach for each research question that is addressed in this study. Finally, we explain how we select the studied projects and construct the dataset that we use to perform the analyses that compose this dissertation.. 3.1. Research Questions. In this section, we present the motivation and research approach for each studied RQ of this dissertation. In the following, we present each RQ grouped by its respective dimension of analysis, i.e., RQ1—RQ5 compose the Analysis I, which intend to study the impact of continuous integration on the delivery time of pull requests, while RQ6 and RQ7 compose Analysis II, which study the impact of continuous integration on the prolonged delivery time.. Analysis I — What is the impact of continuous integration on the delivery time of pull requests? 3.1.1. RQ1: How often are merged pull requests prevented from being released?. RQ1: Motivation A higher delay to release pull requests can be frustrating to users and contributors of a software project, once they care most about the time for a pull request to become available rather than the required time to merge such a pull request into the project code base. In this matter, it is important to investigate whether pull requests are being delivered immediately (e.g., in the next possible release after they have being merged) or not, because a long delivery time may frustrate users and contributors. In.

(29) 28. Chapter 3. Empirical Study. RQ1, we study how often merged pull requests are being prevented from delivery, both before and after continuous integration. The investigation of RQ1 is our first step to understand how long are the delivery time of pull requests in terms of releases. RQ1: Approach We use an approach similar to the one used by Costa et al. (COSTA et al., 2017) to investigate how often pull requests are prevented from being released. First, we compute the delivery time in terms of releases for each merged pull request of the investigated projects (see Definition 2). Next, for each of our investigated projects, we observe the percentage of pull requests that were delivered in the next upcoming release, and the percentage of pull requests that were prevented from being delivered in at least one release. Next, we grouped the pull requests of each project into two buckets: before and after continuous integration. For each bucket, we also observe the percentage of pull requests by project that were prevented from being delivered in one or more releases. The pull requests that do not miss any release were grouped into the next release bucket, while the pull requests that miss one or more releases were grouped into the later release bucket. Finally, we analyze whether merged pull requests are being prevented from being released because their merge occurs near to an upcoming release date, i.e., one day or week before the release date. For this purpose, we compute the merge timing metric, which represents the moment at which a pull request is merged in the release cycle. The merge timing ranges from 0 to 1. A merge timing value nearby to 1 indicates that the pull request was merged early in the release cycle, while merge timing values close to 0 represent the opposite. To compute the merge timing metric we use the following equation: (i) the remaining number of days after a pull request is merged —for an upcoming release over (ii) the duration in terms of days of its release cycle (See Equation 3.1). # days that is remaining for a release release cycle duration. 3.1.2. (3.1). RQ2 - Are pull requests released more quickly using continuous integration?. RQ2: Motivation In recent years, many software companies have adopted the continuous integration practice in their development life cycle. This wide adoption is related to the.

(30) Chapter 3. Empirical Study. 29. perceived benefits that are brought by continuous integration. For instance, the risk reduction, a higher confidence of the development team regarding their software product (DUVALL; MATYAS; GLOVER, 2007), higher productivity, higher release frequency and predictability (STÅHL; BOSCH, 2014b), and the allure of delivering new features more quickly (LAUKKANEN; PAASIVAARA; ARVONEN, 2015). However, there is a lack of studies that empirically check whether continuous integration really reduces the time-to-delivery of merged pull requests. In RQ2, we study the delivery time of merged pull requests before and after the adoption of continuous integration. RQ2: Approach Figure 5 shows the basic life cycle of a released pull request: (t1) merge phase; and (t2) delivery phase. We refer to the t1 + t2 time as to the lifetime of a pull request. In RQ2, we analyze the merge and delivery phases. The merge phase (t1) is the required time for pull requests to be merged into the codebase, whereas the delivery phase (t2) refers to the required time for pull requests to be released after they have been merged, i.e., ready to be delivered to end-users.. Figure 5 – The basic life-cycle of a released pull request. We use beanplots (KAMPSTRA et al., 2008) to visually compare the different distributions of delivery time (see Figure 14). The higher the data frequency for a given value, the wider the bean is plotted on the Y axis for that particular value. In addition, we use Mann-Whitney-Wilcoxon (MWW) tests (WILKS, 2011) followed by Cliff’s delta effect-size measures (CLIFF, 1993). The MWW test is a non-parametric test whose null hypothesis is that two distributions come from the same population (α = 0.05). Cliff’s delta is a non-parametric effect-size metric to verify the magnitude of the difference between the values of two distributions. The higher the Cliff’s delta value, the greater the difference between distributions. A positive Cliff’s delta shows how larger are the values of the first distribution, while a negative Cliff’s delta shows the opposite. We use the thresholds provided by Romano et al. (ROMANO et al., 2006), i.e. delta < 0.147 (negligible), delta < 0.33 (small), delta < 0.474 (medium), and delta >= 0.474 (large). We use such statistical tools to analyze the entire life-cycle of a pull request before and after continuous integration. First, we analyze the pull request lifetime (t1 + t2). Then, we analyze the (t1) merge and (t2) delivery phases of a pull request separately..

(31) Chapter 3. Empirical Study. 3.1.3. 30. RQ3 - Does the increased development activity after adopting continuous integration increase the delivery time of pull requests?. RQ3: Motivation In RQ2, we find that 53% (48/90) of our studied projects deliver submitted pull requests more quickly before adopting continuous integration. However, since the adoption of continuous integration is motivated by the increase of the release frequency and predictability (STÅHL; BOSCH, 2014b), we suspected that pull requests would be delivered more quickly after the adoption of continuous integration. Nevertheless, the results suggest an opposite trend, which lead us to the following question: Why do 53% of our studied projects deliver submitted pull requests more quickly before adopting continuous integration? This investigation is important to better understand the impact of adopting continuous integration in software development. RQ3: Approach Similar to RQ2, we use Mann-Whitney-Wilcoxon tests (WILKS, 2011) and Cliff’s deltas (CLIFF, 1993) to analyze the data. We also use box plots (WILLIAMSON; PARKER; KENDRICK, 1989) to visually summarize and perform comparisons. In this research question, we investigate whether the increase on the delivery time of pull requests after adopting continuous integration is related to a significant increase in the pull request submissions after adopting continuous integration. We group our dataset into two buckets: before and after the adoption of continuous integration. For each bucket, we count the number of pull requests that are submitted, merged and delivered per release. We perform three comparisons in this RQ. First, we compare whether pull request submissions (per release) significantly increase after adopting continuous integration. Next, we organize our projects into two groups: (i) the projects for which the delivery time of pull requests increased after adopting continuous integration and (ii) the projects for which the delivery time of pull requests decreased after adopting continuous integration. For each group, we compare whether the submissions of pull requests significantly increased after adopting continuous integration..

(32) Chapter 3. Empirical Study. 3.1.4. 31. RQ4: How well can we model the delivery time of merged pull requests?. RQ4: Motivation Several studies have proposed approaches to investigate the required time to merge a pull request (YU et al., 2015; YU et al., 2016) and to prioritize pull requests based on their characteristics (VEEN; GOUSIOS; ZAIDMAN, 2015). These studies could help integrators to prioritize their work in the face of multiple concurrent pull requests, they could also help to estimate when a pull request will be merged by an integrator of a software project. However, even though most pull requests are merged well before to the next release date, many of them are not delivered in the next release. In this matter, knowing the delivery time of merged pull requests is of great interest for the users and contributors of a software project. In RQ4, we investigate whether we can accurately model the delivery time of merged pull requests in terms of number of days and releases (see Definitions 1 and 2 of delivery time). Our explanatory models are important to understand which variables may impact the delivery time of pull requests. Furthermore, the models could be used in future works and by practitioners to estimate when a merged pull request will likely be delivered (i.e., in the next release or after-1 or more releases). RQ4: Approach To study when a merged pull request is released, we use an approach on applying supervised machine learning. The input for the learning algorithm is a set of attributes that would describe each pull request as detailed as possible. During the feature selection process, we collect information from the VCSs of the studied projects to include attributes that belong to one of the following families: contributor, pull request, project and process. We choose these families of attributes because we intend to investigate a variety of perspectives that may have influence on the delivery time of a merged pull request. Furthermore, Tables 4 and 5 show the complete description of the attributes that we compute for each family, and show the rationale that we use to include each attribute as a predictor of delivery time. We train explanatory models to study whether a merged pull request will be delivered into the next possible release or whether such a pull request will be prevented from delivery in one or more releases (see Definition 2). To study delivery time in terms of releases we use Logistic Regression Models (DAYTON, 1992; HILBE, 2009). We model the response variable Y as Y = 1 for the merged pull requests that were delayed, i.e., the pull requests missed at least one release before being released, and Y = 0 otherwise..

(33) 32. Chapter 3. Empirical Study. In this context, our models are intended to explain why a given merged pull request has its delivery delayed (i.e., Y = 1). We use the Area Under the Curve (AUC) and Brier score metric to evaluate the performance of our models. The AUC metric is used to evaluate the degree of discrimination achieved by the models (HANLEY; MCNEIL, 1982). For instance, AUC can be used to evaluate how well our models can distinguish between merged pull requests that are delivered into the next possible release after they have been merged, and the pull requests that are prevented from delivery in one or more releases. The AUC refers to the area below the curve plotting the true positive rate against false positive rate. The values of AUC ranges from 0 (worst) to 1 (best). An area greater than 0.5 indicates that the explanatory model outperforms a random guessing (COSTA et al., 2017). Mehdi et al. (MEHDI et al., 2011) provide a rough guide for classifying the accuracy of a diagnostic test by using AUC metric, i.e., .90-1: excellent; .80-.90: good; .79-.80: fair; .60-.70: poor; .50-.60: fail. On the other hand, the Brier score (EFRON, 1986) metric is used to evaluate the accuracy of probabilistic predictions. The Brier score measures the mean squared difference between the probability of delay assigned by our models for a particular pull request P and the actual outcome of P (i.e., if P is actually delayed or not). Hence, the lower the Brier score, the more accurate the probabilities that are produced by our explanatory models (COSTA et al., 2016). We also study delivery time in terms of number of days (Definition 1). To perform this analysis, we use multiple linear regression modeling (Ordinary Least Squares). Linear regression models are simple and often provide an adequate and interpretable description of how one or more explanatory variables X affects the dependent variable Y (HASTIE; TIBSHIRANI; FRIEDMAN, 2009). Regression models fit a curve of the form n. Y = β0 + ∑ Xj β j. (3.2). j =1. The Y variable is the dependent variable (i.e., delivery time in terms of days in our study), while the X is the set of explanatory variables that may share a relationship with Y (e.g., churn and description length in our case). The set of β coefficients represents the weights given by the model to adjust the values of X in order to better estimate the dependent variable Y. Tables 4 and 5 show the set of explanatory variables that we use in our study to predict delivery time in terms of days. They also show the definition and rationale that is used to adopt each variable of our set of explanatory variables. We assess the fit of our linear regression models using the R2 . The R2 corresponds the proportion of the variability in Y that can be explained by using X . In general, it is a challenge to determine what is a good R2 value, since it depends on the.

(34) Chapter 3. Empirical Study. 33. nature of the problem that is being investigated (JAMES et al., 2014). In this study, we consider in our analyses, only the models that achieve R2 values higher than 0.5. In other words, we ensure that at least 50% of the variability of our data is explained by our models. We analyze 91 models in total — 41 using pull requests data before continuous integration, and 50 using data after continuous integration. Appendix B provides the R2 value for each model that we fit. We follow the guidelines of Harrell Jr. (HARRELL, 2015) to build our explanatory models (Logistic and Linear Regression Models). Figure 6 provides an overview of the process that we use to build our models. First, for each studied project we group its data into two buckets: before and after continuous integration. Then, we intend to create two Logistic Regression Models for each project, one using the pull requests data of before continuous integration, and another using the pull requests data of after continuous integration. Also, we train two Linear Regression Models for each project, one using pull requests data of the before-CI bucket, and another using pull requests data of the after-CI bucket.. Figure 6 – Training Linear and Logistic Regression Models. We follow the guidelines that are provided by Harrel Jr. (HARRELL, 2015) to train the explanatory models, which involves eight activities, from data collection to model validation. We present a description of the Step 5.2 and 5.3 in RQ3. In step 1 and 2 we account for co-linearity in our explanatory variables. In step 1, we check the redundancy of our explanatory variables. Redundant variables do not increase the explanatory power of the models and can distort the relationship between explanatory (X ) and response (Y) variables. We use the redun function from the rms R package to remove the redundant variables from our set of explanatory variables. The redun function fits models to explain each explanatory variable using other explanatory variables (COSTA et al., 2017). We discard explanatory variables that are estimated with R2 >= 0.9. In step 2, we check the correlation of the surviving.

(35) Chapter 3. Empirical Study. 34. explanatory variables. We remove the high correlated variables by using a variable clustering analysis (SARLE, 1990). For variables within a cluster have a correlation of | p| > 0.7, we choose only one of them to include in our models. In step 3 and 4, we compute and allocate the budget of the degrees of freedom (D.F.) that the data of each studied project can accommodate while keeping the risk of overfit the models low. When using Logistic Regression Model we compute the n D.F. budget that we can spent in our models by using the equation 10 , where n is the number of instances for the class with the lowest number of instances, and 10 is a denominator that is recommended by Harrell Jr. (HARRELL, 2015). Furthermore, the n value of 10 must be greater or equal than the number of explanatory variables of our models. For example, we have two possible classes in our models (Next and Later) and 13 explanatory variables. If a given project have 1000 pull requests of before the adoption of continuous integration, which 100 belongs to the Later class, and the remaining 900 belongs to the Next class, then the D.F. budget restriction is not satisfied, e.g., 13 (the number of explanatory variables of the model) are greater then 100 10 , where 100 is the number of instance of the class with the lowest number of instances (Later). We filtered out models with similar settings. In step 5, we fitted 54 Logistic Regression models (12 using data of before continuous integration, and 42 using data of after continuous integration). It is important to highlight that the pull requests data of a given project may be used to build 0, 1, or 2 models. For instance, if the project data of before continuous integration do not satisfies the D.F. budget restriction, but the project data of after continuous integrations does, then we train just one model using the data of after continuous integration and vice versa. Furthermore, if the project data of both before and after continuous integration do not satisfies the D.F. budget restriction, we train no model with the data of such a project. The reason as to why the number of models that use data of after continuous integration be greater then the number models that use data of before continuous integration is that most studied projects have less pull requests of before continuous integration, also as 86.2% (median) of the pull requests of before continuous integration are delivered in the Next release/class, then for most projects the number of instances (pull requests) of the Later release/class do not satisfies the D.F. budget restriction. In step 5.1, we assess the stability of our Logistic Regression models by computing the optimism-reduced AUC and Brier Score. The optimism of each metric is computed as follows: (i) we count the D.F. that are spent to fit the original model, then we select a bootstrap sample to fit another model with the same D.F. of the original model; (ii) the model built from the bootstrap sample is applied both on the bootstrap and original samples (AUC and Brier score are computed for each sample). The opti-.

(36) Chapter 3. Empirical Study. 35. mism is the difference in the AUC and Brier score of the bootstrap and original sample. In our analyzes, we fit models for 1,000 bootstrap samples and the average optimism is computed. The AUC and Brier score optimism-reduced is calculated by subtracting the average optimism from the initial AUC and Brier score estimate. In step 5.1, we also evaluate the stability of our Linear Regression models by computing the optimism-reduced R2 . While R2 gives an indication of how much variability may be explained by our Linear Regression models, this metric may also be very dependent of the specific data to which our models were fitted, i.e., overfitted (MCINTOSH et al., 2016). Therefore, optimism-reduced R2 measure how stable are our models. The optimism of the R2 is computed by fitting models using bootstrap samples of the original data. For each model fit to a bootstrap sample, we calculate the difference of the R2 of such a model from the model fit to the original data. This difference is a measure of the optimism in the original model (COSTA et al., 2017). In this study, the bootstrap-calculated optimism is computed by computing the average optimism obtained using a set of 1,000 bootstrap samples. The smaller the bootstrap-calculated optimism the higher the stability of our explanatory models (EFRON, 1986). In step 5.2, we evaluate the impact that each variable of our set of explanatory variables has on the models that we fit, while we study the relationship that the most influential variables share with the response variable (delivery time) in step 5.3. We use these steps to answer RQ5 and RQ7 of this study. In these section we detailed each of the above mentioned steps (5.2 and 5.3).. 3.1.5. RQ5: What are the most influential attributes for modeling delivery time?. RQ5: Motivation In RQ4, we found that our models can accurately model delivery time of pull requests, both in terms of days and in terms of releases (Definitions 1 and 2). To fit our models, we use attributes that we collect from VCSs of the studied projects. As described in Tables 4 and 5, we collected attributes that belongs to different families (contributor, pull request, project and process) that may be related to the delivery time of merged pull requests. In RQ5, we investigate what are the most influential attributes to model delivery time of merged pull requests, both before and after continuous integration..

(37) 36. Chapter 3. Empirical Study. RQ5: Approach In RQ5, we separately investigate what are the most influential variables to model delivery time according to the models that we fit using pull requests data of before continuous integration, and according the models that we fit using data of after continuous integration. Next, we show the relationship that the most influential variables share with delivery time. To identify the most influential variables for estimating the delivery time of merged pull requests both in terms of days (Definition 1) and in terms of releases (Definition 2), we use the Wald χ2 maximum likelihood tests (Step 5.2 of Figure 6). The larger the χ2 value for a variable, the higher the influence that such variable has on our explanatory models. To calculate the χ2 value for each explanatory variable of the models that we fitted, we use the anova function of the rms R package. We use the following approach to calculate the percentage of the explanatory power of each variable of our models. Let V = (v1 , v2 , . . . , vk ) be the set of explanatory variables of our models, and f (vi ) be the function that represents the χ2 value for vi . The explanatory power of vi on model delivery time, denoted as P(vi ), can be computed using Equation 3.3. The explanatory power of a variable ranges from 0 to 1. The higher the explanatory power of a variable, the larger the influence of such a variable to model the delivery time. P ( vi ) =. f ( vi ) k. (3.3). ∑ f (v j ). j =1. To study the relationship that the most influential variables of our models share with the response variable (delivery time), we use the Predict function of the rms package of R language. The Predict function plot the change in the delivery time against the change in each influential variable while holding the other variables constant at their median values..

(38) Chapter 3. Empirical Study. 37. Analysis II — What is the impact of continuous integration on the prolonged delivery time? 3.1.6. RQ6: How well can we identify the merged pull requests that will suffer from a long delivery time?. RQ6: Motivation A long delivery time of merged pull requests may frustrate end users and contributors of a software project. The end users are not much interested in just have a new functionality integrated in the code base of a project, instead they care most about when such a new functionality will be released, so they can benefit from it. Moreover, if a user are not aware of such a long delivery time their frustration may increase considerably because these users are not used to such delivery time (COSTA et al., 2017). This investigation help us to understand how well we can model long delivery time of pull requests, hence it may also help us to mitigate the problem of a prolonged delivery time. RQ6: Approach We calculate prolonged delivery time of pull requests (Definition 3) as described in Section 2.3. Table 1 and Table 2 show the medians and MADs for each studied project to identify merged pull requests that have a long delivery time. For instance, in the Yelp/mrjob project when a pull request takes more than the threshold of 83.4 days (median delivery time + MAD) to be release, we consider that such a pull request has a long delivery time. First, we calculate a long delivery time threshold for all pull requests of each studied project, including pull requests of before and after continuous integration in a unique set. Next, we separately calculate a long delivery time threshold for the pull requests delivered before and after continuous integration. We distinguish a long delivery time threshold for pull requests delivered before and after continuous integration because if a project changes its policy of shipping releases after the adoption of continuous integration (i.e., quicken the time to ship new releases), then a given delivery time may be considered long for pull requests submitted after continuous integration, while may not be considered long for pull requests submitted before continuous integration. In median, delivery time higher then 91 days are considered long for pull requests delivered after continuous integration, while a delivery time of 76 days (median) are considered long for pull requests delivered before continuous integration..

(39) 38. Chapter 3. Empirical Study. 40 30. 22%. 22%. CI. NO-CI. 10. 20. 24%. 0. % of pull requests with long delivery time. 50. Figure 7 shows the distribution of the percentage of pull requests per project that have a long delivery time. On investigating all pull requests (before and after continuous integration together) of each studied projects, we observe that in median 24% of them have a long delivery time. Moreover, on investigating the pull requests of each project separated into two buckets, before and after continuous integration, we observe that in median 22% of such pull requests have a long delivery time both before and after continuous integration.. General. Figure 7 – Percentage of merged pull requests that have a long delivery time. We present the distribution of the percentage of merged pull requests that have a long delivery time on the studied projects. To investigate whether a given merged pull request is likely to have a long delivery time, we use explanatory models (i.e., Logistic Regression Models). As a long delivery time threshold for a pull request of a project may variate depending if the pull request was delivered before or after the projects adopt continuous integration, we separately investigate how well we can identify if a pull request will suffer from a long delivery time, both before and after continuous integration. To train the Logistic Regression models, we produce a dichotomous response variable Y, where Y = 1 means that a merged pull request has a long delivery time, while Y = 0 means that the delivery time of that pull request is normal. Similar to RQ4, when using Logistic Regression Model we must account to the budget of degrees of freedom (D.F.) that the data of each studied project can accommodate while keeping the risk of overfit low. By using the guideline provided by Harrell Jr. (HARRELL, 2015) to.