CodeScoping: A source code based tool to software product lines scoping

Texto

(1)“CodeScoping: A Source Code Based Tool to Software Product Lines Scoping” By. Thiago Fernandes Lins de Medeiros M.Sc. Dissertation. Federal University of Pernambuco [email protected] www.cin.ufpe.br/~posgraduacao. RECIFE, SEPTEMBER/2011.

(2) Federal University of Pernambuco Center for Informatics Graduate in Computer Science. Thiago Fernandes Lins de Medeiros. “CodeScoping: A Source Code Based Tool to Software Product Lines Scoping”. A M.Sc. Dissertation presented to the Center for Informatics of Federal University of Pernambuco in partial fulfillment of the requirements for the degree of Master of Science in Computer Science.. Advisor: Silvio Romero de Lemos Meira Co-Advisor: Eduardo Santana de Almeida. RECIFE, SEPTEMBER/2011.

(3) Catalogação na fonte Bibliotecária Jane Souto Maior, CRB4-571 Medeiros, Thiago Fernandes Lins de. CodeScoping: A source code based tool to software product lines scoping / Thiago Fernandes Lins de Medeiros - Recife: O Autor, 2011. xiv, 92 folhas: il., fig., tab. Orientador: Silvio Romero de Lemos Meira. Dissertação (mestrado) - Universidade Pernambuco. CIn, Ciência da Computação, 2011.. Federal. de. Inclui bibliografia e apêndice. 1. Engenharia de software. 2. Reuso de software. 3. Linhas de produtos de software. I. Meira, Silvio Romero de Lemos (orientador). II. Título. 005.1. CDD (22. ed.). MEI2012 – 013.

(4)

(5) To my parents and my brother..

(6) Acknowledgements. First of all, I would like to thank God for giving me strength to face this challenge in my life. This work would not be possible without His help. Next, I would like to thank my family, especially my parents and my brother, for having supported me at all moments during this work. I know I can count on them for everything! To my beloved, Danielle, for her love, affection, comprehension and for helping me to finish this dissertation. To my childhood friends, Clodomiro, Diego and Robson for their patience to hear always the same answer when they invited me to do something: "I cannot go. I have to work on my dissertation." I would like to thank the RiSE Labs for providing me the perfect environment to learn and discuss several issues that were useful for this dissertation. Thank you very much Crescencio Lima, Iuri Santos, Ivan Machado, Ivonei Silva, Jonatas Bastos, Leandro Marques, Leandro Souza, Paulo Silveira, Raphael Oliveira, Vanilson Burégio and Wylliams Barbosa. In special, my sincere thanks to my co-advisor Eduardo Almeida, for his patience and help in all steps of this work, and to my advisor, Silvio Meira, for accepting me as his student. I am grateful to my friends of the Planning Poker team which provided me a great academic and personal experience along the development of the FireScrum project. Thanks Hernan Munoz, João Roberto, Leandro Souza, Simone Araújo, Thiago Silva, Virgínia Chalegre, and Wylliams Barbosa. Finally, I would like to thank DATAPREV, especially Ana Claudia and Rômulo Paiva, for giving me the time required to develop my Master activities whenever necessary.. iv.

(7) Pain is temporary. Quitting lasts forever. —LANCE ARMSTRONG.

(8) Resumo. Engenharia de Linhas de Produto de Software (ELPS) emergiu rapidamente como uma importante abordagem de desenvolvimento de software durantes os últimos anos. ELPS foca-se na identificação e gerenciamento dos pontos em comum (commonalities) e dos pontos de variação (variabilities) de um conjunto de produtos de software, de forma que artefatos (core assets) possam ser desenvolvidos e (re)usados para construir diferentes produtos com custo reduzido. Além disso, melhoria de produtividade, aumento de qualidade e redução do tempo de entrega dos produtos são alguns dos benefícios proporcionados pela abordagem. Neste contexto, o processo de escopo em linhas de produto de software é responsável pela definição da viabilidade a longo prazo da linha de produtos. Seu principal objetivo é identificar e delimitar produtos, funcionalidades, subdomínios e artefatos (componentes, documentos, etc.) existentes da linha de produtos, onde o investimento em reuso trará benefícios econômicos para a empresa. Normalmente, engenheiros de linha de produtos definem o escopo com informações extraídas da documentação de produtos existentes e baseados no conhecimento de especialistas de domínio. Esta é uma tarefa que demanda muito esforço, pois muito tempo é investido na realização de workshops e entrevistas com os especialistas de domínio. Além disso, frequentemente, os especialistas de domínio não tem tempo disponível para compartilhar o conhecimento deles e a documentação dos produtos existentes é inexistente ou está desatualizada. Assim, a fim de reduzir custos e tempo para a realização do processo de escopo, esta dissertação propõe uma abordagem para auxiliar o processo de escopo baseada no código fonte dos produtos já existentes na empresa. Além disso, são apresentados os requisitos, projeto e implementação de uma ferramenta com o objetivo de guiar os analistas de escopo na identificação de similaridades e variações no código fonte dos sistemas legados. Finalmente, esta dissertação também descreve um estudo empírico que foi utilizado para a elicitação de requisitos e um experimento que foi conduzido para avaliar a viabilidade da ferramenta proposta neste trabalho. Palavras-chave: Linhas de Produto de Software, LPS, Análise de Escopo, Ferramenta, CodeScoping, Código Fonte. vi.

(9) Abstract. Software Product Lines Engineering (SPLE) has rapidly emerged as an important software development approach during the last few years. SPLE focuses on identifying and managing the commonalities and variability of a set of software products such that core assets can be developed and (re)used to derive individual product variants reducing costs. Improvements of productivity, quality, and reduction of development costs and time to market are some of the benefits provided by the approach. In this context, software product line scoping is the process responsible by defining the long-term viability of the product line. Its goal is to identify and delimit products, features, sub-domains and existing assets of the product line where there are economical benefits to invest in reuse. Typically, product line engineers elicit scoping information from the available documentation of existing products and based on knowledge from domain experts. This is an effort-intensive task because much time is invested in workshops and interviews with the domain and systems experts. Moreover, often, the domain experts do not have time to share their knowledge and the documentation of existing products is inexistent or outdated. Thus, in order to reduce costs and time in the scoping process, this dissertation proposes an approach to support the scoping process based on the source code of the existing products. Moreover, the requirements, design and implementation of a tool is presented, guiding the product line engineers to identify the existing commonality and variability in the source code of legacy products. Finally, this dissertation also presents an empirical study to elicit requirements and a controlled experiment in order to evaluate the feasibility of this proposed tool. Keywords: Software Product Lines, SPL, Scoping, Tool, CodeScoping, Source Code, Scope Analysis. vii.

(10) Table of Contents. List of Figures. xii. List of Tables. xiii. List of Acronyms. xiv. 1. 2. 3. Introduction 1.1 Motivation . . . . . . . . . . . . . 1.2 Problem Statement . . . . . . . . 1.3 Overview of the Proposed Solution 1.3.1 Context . . . . . . . . . . 1.3.2 Outline of the Proposal . . 1.4 Out of Scope . . . . . . . . . . . 1.5 Statement of the Contributions . . 1.6 Organization of the Dissertation . Software Visualization: An Overview 2.1 Introduction . . . . . . . . . . . . 2.2 Motivations of Visualization . . . 2.3 Visualization Models . . . . . . . 2.4 Chapter Summary . . . . . . . . .. . . . . . . . .. . . . .. . . . . . . . .. . . . .. . . . . . . . .. . . . .. . . . . . . . .. . . . .. An Overview on Software Product Lines 3.1 Introduction . . . . . . . . . . . . . . . . 3.2 SPL Essential Activities . . . . . . . . . . 3.2.1 Core Asset Development . . . . . 3.2.2 Product Development . . . . . . . 3.2.3 Management . . . . . . . . . . . 3.3 Software Product Lines Adoption Models 3.4 Software Product Lines Scoping . . . . . 3.5 Visualization in Software Product Lines . 3.6 Chapter Summary . . . . . . . . . . . . .. . . . . . . . .. . . . .. . . . . . . . . .. . . . . . . . .. . . . .. . . . . . . . . .. . . . . . . . .. . . . .. . . . . . . . . .. . . . . . . . .. . . . .. . . . . . . . . .. . . . . . . . .. . . . .. . . . . . . . . .. . . . . . . . .. . . . .. . . . . . . . . .. . . . . . . . .. . . . .. . . . . . . . . .. . . . . . . . .. . . . .. . . . . . . . . .. . . . . . . . .. . . . .. . . . . . . . . .. . . . . . . . .. . . . .. . . . . . . . . .. . . . . . . . .. . . . .. . . . . . . . . .. . . . . . . . .. . . . .. . . . . . . . . .. . . . . . . . .. . . . .. . . . . . . . . .. . . . . . . . .. . . . .. . . . . . . . . .. . . . . . . . .. . . . .. . . . . . . . . .. . . . . . . . .. 1 2 3 3 3 5 6 6 7. . . . .. 8 8 9 10 14. . . . . . . . . .. 15 15 17 18 19 20 20 22 23 25. viii.

(11) 4. 5. A Focus Group Study to Understand Software Product Lines Scoping in Industrial Setting 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Research Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Define the Problem . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Questions Development . . . . . . . . . . . . . . . . . . . . . 4.2.3 Project Context and Participants Selection . . . . . . . . . . . . 4.2.4 Focus Group Moderation . . . . . . . . . . . . . . . . . . . . . 4.2.5 Data Analysis Process and Method . . . . . . . . . . . . . . . . 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Background of the Participants . . . . . . . . . . . . . . . . . . 4.3.2 Main Problems of Product Line Scoping . . . . . . . . . . . . . 4.3.3 The Need of a Tool to Support the Scoping Process . . . . . . . 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 26 26 27 27 28 29 29 30 31 31 32 33 34 34 35 35. CodeScoping: A Source Code Based Tool to Software ing 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . 5.2 Approach Overview . . . . . . . . . . . . . . . . . 5.3 Requirements . . . . . . . . . . . . . . . . . . . . 5.3.1 Functional Requirements . . . . . . . . . . 5.3.2 Non-Functional Requirements . . . . . . . 5.4 Tool Architecture . . . . . . . . . . . . . . . . . . 5.4.1 Similarity Comparison Module . . . . . . XML Parser . . . . . . . . . . . . . . . . . Clone Detector . . . . . . . . . . . . . . . Similarity Calculator . . . . . . . . . . . . 5.4.2 Visualization Module . . . . . . . . . . . . Data Manager . . . . . . . . . . . . . . . . Views . . . . . . . . . . . . . . . . . . . . Graphical User Interface . . . . . . . . . . 5.5 Implementation . . . . . . . . . . . . . . . . . . . 5.6 CodeScoping in Action . . . . . . . . . . . . . . .. 37 37 38 40 40 41 42 42 42 44 45 46 46 46 50 51 51. Product Lines Scop. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. ix.

(12) 5.7 6. 7. Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. A Preliminary Experimental Study 6.1 Introduction . . . . . . . . . . . 6.2 Experiment Process . . . . . . . 6.3 Definition . . . . . . . . . . . . 6.3.1 Goal . . . . . . . . . . . 6.3.2 Questions . . . . . . . . 6.3.3 Metrics . . . . . . . . . 6.4 Planning . . . . . . . . . . . . . 6.4.1 Context Selection . . . . 6.4.2 Hypothesis Formulation 6.4.3 Variables Selection . . . 6.4.4 Selection of Subjects . . 6.4.5 Experiment Design . . . 6.4.6 Instrumentation . . . . . 6.4.7 Validity Evaluation . . . Conclusion Validity . . . Internal Validity . . . . Construct Validity . . . External Validity . . . . 6.5 Operation . . . . . . . . . . . . 6.6 Analysis and Interpretation . . . 6.6.1 Quantitative Analysis . . 6.6.2 Qualitative Analysis . . 6.6.3 Lessons Learned . . . . 6.7 Chapter Summary . . . . . . . . Conclusion 7.1 Research Contributions . . 7.1.1 Empirical Study . 7.1.2 CodeScoping . . . 7.1.3 Experimental study 7.2 Future Work . . . . . . . . 7.3 Concluding Remarks . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. 55. . . . . . . . . . . . . . . . . . . . . . . . .. 56 56 56 57 58 58 58 59 59 60 60 61 61 61 62 62 63 63 64 64 65 65 67 68 69. . . . . . .. 70 71 71 71 71 72 73. x.

(13) References. 74. Appendices. 82. A Focus Group Instruments A.1 Background Questionnaire . . . . . . . . . . . . . . . . . . . . . . . .. 83 83. B Experimental Study Instruments B.1 Timesheet Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.2 Background Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . B.3 Feedback Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . .. 86 86 88 90. xi.

(14) List of Figures. 1.1 1.2. RiSE Labs Influences . . . . . . . . . . . . . . . . . . . . . . . . . . . RiSE Labs Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4 5. 2.1 2.2 2.3. Reference Model for Visualization . . . . . . . . . . . . . . . . . . . . Nested Four-Level Model for Visualization . . . . . . . . . . . . . . . Threats and Validation in the Nested Model . . . . . . . . . . . . . . .. 10 12 13. 3.1 3.2 3.3. The Three Essential Activities for Software Product Lines . . . . . . . Core Asset Development . . . . . . . . . . . . . . . . . . . . . . . . . Product Development . . . . . . . . . . . . . . . . . . . . . . . . . . .. 17 18 20. 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9. Source Code Based Approach to Support SPL Scoping CodeScoping Architecture . . . . . . . . . . . . . . . Example of a XML File describing a product . . . . . Similarity Analysis View Design . . . . . . . . . . . . Source Code View Design . . . . . . . . . . . . . . . Product Map View Design . . . . . . . . . . . . . . . CodeScoping in Action: Product Map View . . . . . . CodeScoping in Action: Similarity Analysis View . . . CodeScoping in Action: Source Code View . . . . . .. . . . . . . . . .. 38 43 44 49 49 50 52 53 54. 6.1. Overview of the Experiment Process . . . . . . . . . . . . . . . . . . .. 57. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. xii.

(15) List of Tables. 4.1 4.2 4.3. Focus group questions . . . . . . . . . . . . . . . . . . . . . . . . . . Focus group participants and their role(s) in the project . . . . . . . . . Focus group participants and their background . . . . . . . . . . . . . .. 28 30 32. 6.1 6.2 6.3 6.4. Subject’s Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Collected data during the experiment . . . . . . . . . . . . . . . . . . . Descriptive statistics of the time spent on scoping analysis. . . . . . . . Descriptive statistics of the features configured correctly in the product map. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 64 65 66 67. xiii.

(16) List of Acronyms. BTT. Bug Report Tracker Tool. CBD. Component-Based Development. CCB. Change Control Board. C.E.S.A.R. Recife Center For Advanced Studies and Systems FR. Functional Requirement. GQM. Goal Question Metric. NFR. Non-Functional Requirement. RiSE. Reuse in Software Engineering Labs. SD. Standard Deviation. SPL. Software Product Lines. UFPE. Federal University of Pernambuco. xiv.

(17) 1. Introduction. Software Product Lines Engineering (SPLE) has rapidly emerged as an important software development approach during the last few years (Linden et al., 2007). SPLE focuses on identifying and managing the commonalities and variabilities of a set of software products such that core assets can be developed and (re)used to derive individual product variants reducing costs. Improvements of productivity, quality, and reduction of development costs and time to market are some of the benefits provided by the approach. In order to adopt a software product lines approach, an organization should first define the product line scope during the scoping process. It is the process responsible by defining the long-term viability of the product line. Its goal is to identify and delimit products, features, sub-domains and existing assets of the product line where there is economical benefits to invest in reuse (John, 2010). However, in some cases, the lack of documentation or the lack of sufficient domain expert time can difficult the scope analysis of the product lines, increasing the time and costs involved. Thus, the focus of this dissertation is to provide a tool to support the scoping process in order to address these problems. The remainder of this chapter describes the focus of this dissertation and starts by presenting its motivation in Section 1.1 and a clear definition of the problem in Section 1.2; An overview of the proposed solution is presented in Section 1.3; Section 1.4 describes some related aspects that are not directly addressed by this work; Section 1.5 presents the main contributions and, finally, Section 1.6 describes the structure of the remainder of this dissertation.. 1.

(18) 1.1. MOTIVATION. 1.1. Motivation. Reduction of development costs and time-to-market, and improvements related to quality and customer satisfaction are some of the benefits provided by software product lines. To take advantage of these benefits, an organization needs to transition from single software development to a product line paradigm. In order to support the organizations in this transition, Krueger (2002) proposed three adoption models (proactive, reactive and extractive), allowing an organization to select the most suitable, depending on its objectives, budget, time and requirements. Compared to the reactive and proactive approaches, the extractive model enables a quicker transition to the product line paradigm (Krueger, 2002), since existing software artifacts from one or more existing products are extracted and reengineered to serve as the core asset inputs for the product line. Furthermore, this model is typically used in practice, as are described in successful cases of big companies, such as, Bosch Gasoline Systems, Nokia Mobile Phones, Nokia Networks and Philips Medical Systems (Linden et al., 2007). In this context, an organization using the extractive approach to adopt a SPL, the product line scope should be defined according to the existing products and business and market goals of the organization. Thus, the scoping activities are strongly based on analysis of the available documentation of existing products and knowledge from domain experts, since the scope analysts need to gather information about the products by consulting the domain experts, which are the only people who know in details the products and the features that will compose the product line. On the other hand, this may be considered an effort-intensive task since much time is invested in workshops and interviews with the domain experts (John, 2010). Moreover, often, the domain experts do not have time to share their knowledge and the documentation of existing products is inexistent or outdated. Thus, we believe that the source code of existing products can be also an important asset to extract information about variability and commonality. One advantage of source code analysis is that it can be performed independently from the domain experts, which will be necessary only to validate the results of the analysis. Moreover, unlike the documentation that can be outdated, the source code represents the current state of a product. However, analyzing the source code of several products and identifying commonalities and variabilities, without proper tool support, can be a more complex task than eliciting information from documentation and domain experts, since industrial. 2.

(19) 1.2. PROBLEM STATEMENT. systems are composed of thousands or millions of lines of source code. Due to this large amount of data (lines of source code), the use of visualization techniques is recommended, because they aid the comprehension of huge amount of data and the perception of data properties that were not clear (Heer et al., 2010). In this sense, this dissertation proposes a tool to support the scoping process based on the source code of legacy products, in order to reduce costs and time involved in the process. Moreover, due to complexity embedded in the source code analysis, visualization techniques were used to facilitate the work of the scope analysts.. 1.2. Problem Statement. Based on the challenges presented in the previous section, the goal of this dissertation can be stated as follows: This work investigates the problems related to scoping process in adopting a software product line using the extractive approach, and provides the requirements, design and implementation of a tool to support the scoping process based on the source code of the existing products.. 1.3. Overview of the Proposed Solution. In order to achieve the goal of this dissertation, CodeScoping (a source code based tool to support the software product lines scoping) is proposed. The remainder of this section describes the context where it was developed and outlines the proposed solution.. 1.3.1 Context This dissertation is part of the RiSE Labs1 (Almeida et al., 2004), formerly called RiSE Project, whose goal is to develop a robust framework for software reuse in order to enable the adoption of a reuse program. RiSE Labs is influenced by a series of areas, such as software measurement, architecture, quality, environments and tools, and so on, in order to achieve its goal. The influence areas are depicted in Figure 1.1. Based on these areas, the RiSE Labs is divided in several projects, as shown in Figure 1.2. As it can be seen, it embraces several different projects related to software reuse and software engineering, such as: 1 http://www.rise.com.br/research/. 3.

(20) 1.3. OVERVIEW OF THE PROPOSED SOLUTION. Figure 1.1 RiSE Labs Influences. • RiSE Framework: It involves reuse processes (Almeida et al., 2004; Nascimento, 2008), component certification (Alvaro, 2009) and reuse adoption and adaptation processes (Garcia, 2010; Cavalcanti, 2007). • RiSE Tools: Research focused on software reuse tools, such as the Admire Environment (Mascena, 2006), the Basic Asset Retrieval Tool (B.A.R.T) (Santos et al., 2006), which was enhanced with folksonomy mechanisms (Vanderlei et al., 2007), semantic layer (Durao, 2008), facets (Mendes, 2008) and data mining (Martins et al., 2008), the Legacy InFormation retrieval Tool (LIFT) (Brito, 2007), the Reuse Repository System (CORE) (Buregio, 2006), the Tool for Domain Analysis (ToolDAy) (Lisboa et al., 2011) and a Software Product Lines System Test Case Tool (Neto, 2011); • RiPLE: Stands for RiSE Product Line Engineering Process and aims at developing a methodology for Software Product Lines, composed of scoping (Moraes, 2010), requirements engineering (Neiva, 2009), design (Cavalcanti, 2010), implementation, test (Machado, 2010; Neto, 2010), evolution management (Oliveira, 2009), and product derivation (Souza, 2011);. 4.

(21) 1.3. OVERVIEW OF THE PROPOSED SOLUTION. • SOPLE: Development of a methodology for Software Product Lines based on services, with some idea of the RiPLE (Medeiros, 2010; Ribeiro, 2010); • MATRIX: Investigates the area of measurement in reuse and its impact on quality and productivity; • BTT: Research focused on tools for detection of duplicate bug reports based on text mining (Cavalcanti, 2009; Cunha, 2009); • Exploratory Research: Investigates new research directions in software engineering and its impact on reuse; • CX-Ray: Focused on understanding with empirical data the Recife Center For Advanced Studies and Systems (C.E.S.A.R.), its processes and practices in software development, including reuse. This dissertation is part of the RiSE Tools project and its goal is to provide a source code based tool to support the software product lines scoping process, based on legacy products.. Figure 1.2 RiSE Labs Projects. 1.3.2 Outline of the Proposal This dissertation presents a tool to support the software product line scoping process based on the source code of existing products. Its goal is to reduce the dependence between the scope analysts and the domain experts, and consequently reduce the time. 5.

(22) 1.4. OUT OF SCOPE. and costs involved in the scoping analysis. It is important to highlight that the tool does not aim to replace the common scoping approaches, but complement them, providing an additional source of information for the scope analysts. Moreover, industrial systems are composed of thousands or millions of lines of source code. Thus, in order to develop a simple and intuitive tool which facilitates the scoping analysis, visualization techniques were used since they are indicated to aid the comprehension of huge amount of data and the perception of data properties that were not clear.. 1.4. Out of Scope. Some aspects that are related to this research will be left out of its scope due to time constraints imposed on a master degree. Thus, the following issues are not directly addressed by this work: • Definition of a new scoping process. Although the focus of the tool developed be different from other approaches, which are heavily based on workshops and interviews with domain analysts, the proposed solution does not aim to replace other approaches, but complement them, providing an additional source of information for the scope analysts. • Quality of the clone detection. The proposed tool uses the clone detector CCFinderX, an upgrade of CCFinder (Kamiya et al., 2002), to find duplicate or similar source code, however, it is out of the scope of this work to analyze the quality of the clone detection. For this purpose, the work of Bellon et al. (2007) can be considered an important source on the comparison and evaluation of different clone detection tools. • Automatic feature identification in the source code. Despite being planned a module for identifying features in the source code of existing products, the tool developed in this work does not address the implementation of this module due to time constraints.. 1.5. Statement of the Contributions. As a result of the work presented in this dissertation, the following contributions can be highlighted:. 6.

(23) 1.6. ORGANIZATION OF THE DISSERTATION. An empirical study in an industrial setting. A focus group study was conducted to understand the problems that product line engineers face in SPL scoping in an industrial project and elicit requirements to develop the proposed tool to support the scoping process in an extractive approach. A source code based solution for the scoping process. It proposes an approach to support the SPL scoping process based on the source code of existing products and using visualization techniques. Moreover, the requirements, design and implementation of the proposed tool is presented. It is important to highlight that the proposed solution includes the development of three modules, but due to time constraints, one of them was left out of scope and it will be developed in a future work. An experimental study to evaluate the proposed solution. This dissertation presents the definition, planning, operation and analysis of an experimental study, which was conducted with scope analysts to evaluate the feasibility of the proposed tool.. 1.6. Organization of the Dissertation. The remainder of this dissertation is organized as follows: Chapter 2 presents an overview about software visualization, discussing basic concepts, the motivation and existing visualization models proposed to guide the development of visualization systems. Chapter 3 discusses the software product lines benefits, activities and basic concepts, adoption models, as well as the use of visualization techniques in the SPL area. Chapter 4 describes an empirical study conducted in an industrial setting in order to understand the problems that occur in the scoping process and elicit requirements for the development of the proposed tool. Chapter 5 presents the proposed solution in details. Moreover, the requirements, design and the implementation of CodeScoping are discussed. Chapter 6 describes in details an experiment conducted with scope analysts to evaluate CodeScoping. Chapter 7 concludes this dissertation, summarizes the findings of this work and discusses possible future work.. 7.

(24) 2. Software Visualization: An Overview. 2.1. Introduction. The production of digital information is increasing each year. Only in 2011, the world will produce approximately 1,800 exabytes (an exabyte is a billion gigabytes, while a gigabyte is a billion bytes), according to a forecast (Gantz et al., 2008). In this context, visualization can be considered very important, because it provides ways to explore, relate, and communicate the data meaningfully. Gershon (1994) defines visualization as "the process of transforming information into a visual form, enabling users to observe the information. The resulting visual display enables the scientist or engineer to perceive visually features which are hidden in the data but nevertheless are needed for data exploration and analysis". Then, the goal of visualization is to allow that a huge amount of data can be explored and understood using mainly the ability of the human visual system to identify patterns, outliers and relationships (Heer et al., 2010). Through the years, several areas have used visualization to support their activities, such as mechanical engineering, chemistry, physics, and medicine. In the area of software engineering, the programs have become increasingly larger and more complex. Thus, the need for new ways to help the comprehension of programs also increased. In this context, the software visualization arose focusing on enhancing the representation, presentation and appearance of the programs. Software visualization is a sub-area of information visualization. The goal of information visualization is to visualize any kind of abstract data, while in software visualization the focus lies on visualizing software. In the beginning, some authors considered software visualization only as the visualization of algorithms and programs. However, Diehl (2007) describes software visualization as "the visualization of artifacts related to software and. 8.

(25) 2.2. MOTIVATIONS OF VISUALIZATION. its development process". This definition is wider, because besides the source code of the programs, it also includes all kinds of artifacts used during the software development process. Thus, the visualization of documents of requirements, design or architecture, for example, is also considered as software visualization. Then, besides helping to comprehend software systems, the goal of software visualization is to improve the productivity of the software development process (Diehl, 2007). The remainder of this chapter is organized as follows: Section 2.2 presents the motivations to use visualization techniques; in Section 2.3 are presented visualization models that guides the development and analysis of visualization systems; and Section 2.4 summarizes the chapter.. 2.2. Motivations of Visualization. Comprehension of huge amount of data and the perception of properties of the data that were not clear can be considered the main benefits in the use of visualization. These benefits are only achieved due to characteristics of the brain and the human visual system. According to the dual-coding theory (Paivio, 1990), by using verbal and nonverbal representations for the same type of information, visualization helps to explore the capacity of all the brain, integrating the "right side" and "left side". Moreover, nearly 75% of all information perceived from the world is visually perceived (Diehl, 2007), confirming the importance of the human visual system. In summary, the goal and main benefit of visualization is to amplify cognition, the mental process of knowing, including aspects such as awareness, perception, reasoning, and judgment. According to Card et al. (1999), six ways in which visualizations can amplify cognition are described next: • Increasing resources: Visualizations can support the work of cognition, allowing visual objects represent directly the information that will need brain processing. They can also store details, but they are only accessed when required by the User, avoiding information overload. • Reducing the search for information: Grouping, compacting information into a small space or visually relating information can reduce the search for data. Moreover, visualizations can allow overviews and details on demand. • Enhancing the recognition of patterns: Visualizations can facilitate recognition of patterns in the data. Visual properties such as connectedness, proximity, similar-. 9.

(26) 2.3. VISUALIZATION MODELS. ity of color or shape, symmetry, relative size and orientation, when used correctly, allows the human brain to perceive the patterns. • Enabling perceptual inference: Visualizations can support the easy perceptual inference of relationships that are otherwise more difficult to induce. • Enabling perceptual monitoring: Visual objects can have preattentive features, i.e. features that are perceived within 200ms, allowing the monitoring of a large number of events. Such features include the orientation, length, and width of lines, the size of an object, curvature, number, intersection, color, etc (Diehl, 2007). • Encoding information in a manipulable medium: Visualizations can allow that the user interacts direct interaction with the data. Thus, the user can adjust the representation of the environment according to their needs.. 2.3. Visualization Models. Visualization can be described as "the mapping of data to visual form that supports human interaction in a workspace for visual sense making" (Card et al., 1999). This mapping often follows a common process to all visualization systems. In this context, in order to simplify the discussion of information visualization systems and their creation, Chi and Riedl (1998) and Card et al. (1999) proposed visualization models with focus on the data and their transformations. Due to the similarity between the two models, only the reference model for visualization proposed by Card et al. (1999) will be presented, as illustrated in Figure 2.1.. Figure 2.1 Reference Model for Visualization (Card et al., 1999). 10.

(27) 2.3. VISUALIZATION MODELS. In Figure 2.1, the arrows flow from data on the left to the visualization user, indicating a series of data transformations. In the opposite direction, the arrows indicate that the user can interact with the visualization system, adjusting how the data transformations are performed (Card et al., 1999). The first transformation is Data Transformations which map Raw Data, that is, data in some domain specific format that is often hard to work with, into Data Tables. Data Tables are relational descriptions of data extended to include metadata (i.e. descriptive information about the data). The goal of this transformation is to structure the raw data, enabling an easier mapping to visual forms. Visual Mappings then transform Data Tables into Visual Structures. Visual Structures encode information through the combination of spatial substrates (position in the space), marks (points, lines, areas, volumes), and graphical properties (e.g., colour, texture or intensity) (Card et al., 1999). This transformation is the most important in the reference model, because it is responsible by the mapping of structured data to visual objects based on graphical properties. Then, the Visual Structures depend strongly on how the Data Tables are organized. It is also important to note that a Visual Mapping should preserve the data and be perceived well by the human. View Transformations create Views of the Visual Structures by specifying graphical parameters such as position, scaling, and clipping. Unlike static diagrams, View Transformations interactively modify and augment Visual Structures to extract more information from the visualization. Finally, user interaction controls parameters of these transformations. For example, the user can filter data or change the color of an visual structure based on his needs. In general, interactions are used to support the user in performing some specific task of the system. Shneiderman (1996) proposed seven interaction tasks that an information visualization system should provide to the user. They are describe below: • Overview: Gain an overview of the entire collection of data that is represented. • Zoom: Zoom in on items of interest. When zooming, it is important that global context can be retained. • Filter: Filter out uninteresting items. The users can control the content of the display and focus on their interests. • Details-on-demand: Select an item or group and get details when needed. After zooming or filtering data, the user should be able to access details of the items of interest.. 11.

(28) 2.3. VISUALIZATION MODELS. • Relate: View relationships among items. The users can select an item and other items with similar attributes can be highlighted in the visualization. • History: Keep a history of actions to support undo, replay, and progressive refinement. It is important, because during the exploration of the data, the users can access the history of actions and retrace their steps. • Extract: Allow extraction of sub-collections and of the query parameters. This task allows the users to extract a data set and export them to another medium, such as a file, in a readable format. This task concerns saving the current state of the data set presented in the visualization. Software visualization maps to this reference model directly. The Raw Data is source code, execution data, system’s documentation (requirements, design, architecture), and so on (Maletic et al., 2002). Although the source code is readable, only small pieces of code can be understood at a time. Based on the Raw Data, Data Tables are organized in abstract syntax trees, program dependence graphs, or class/object relationships for example. Finally, Visual Structures are rendered based on software-specific visualization. Typically, they are very specific to a particular software engineering task, as identifying system’s architecture or performing maintenance activities. Other visualization model was proposed by Munzner (2009). She presents a nested model that besides guiding the visualization design, it also suggests evaluation methods to validate the threats that occur at each level of the model. Although the levels form a waterfall, it is important that this process be iterative, since an error in the first level can propagate to all other levels (Munzner, 2009). Figure 2.2 illustrates the nested model for visualization design and evaluation, and the four levels are briefly described next:. Figure 2.2 Nested Four-Level Model for Visualization (Munzner, 2009). • Domain problem and data characterization. At this first level, the problem which will be addressed by the visualization need to be clearly understood. Thus,. 12.

(29) 2.3. VISUALIZATION MODELS. the target domain should be described according to their tasks, data and problems. For this, its own vocabulary should be used, easing the engaging with the target users. • Operation and data type abstraction. The goal of this level is to map problems and data from the vocabulary of the specific domain into a more abstract description that is in the vocabulary of information visualization (Munzner, 2009). Furthermore, this level is concerned with the transformation of raw data into data types supported by visualization techniques. • Visual encoding and interaction design. The design of the visual encoding and interaction is addressed in this level. Unlike the model proposed by Card et al. (1999), where Visual Mappings and User Interactions are separated steps, this model considers visual encoding and interaction together. • Algorithm design. In this level, the algorithm responsible by the automatic visual encoding and user interaction should be implemented. As mentioned before, this model presents a set of threats to validity at each level and suggests possible validation approaches. Figure 2.3 shows a summary of this threats and validation methodologies which can se applied.. Figure 2.3 Threats and Validation in the Nested Model (Munzner, 2009). At the domain problem and data characterization level, the threat is that the problem does not exist in reality, and therefore the visualization system will not provide benefits to. 13.

(30) 2.4. CHAPTER SUMMARY. the users. At the operation and data type abstraction level, the threat is that the mapped operations and data types do not solve the problems of the users (Munzner, 2009). Since that visual structures encode information through of graphical properties, if they are not effective in the data representation, they may be considered the threat at the visual encoding and interaction design level. Finally, at the algorithm design level, the threat is that the algorithm does not have affordable computation time and memory.. 2.4. Chapter Summary. This chapter presented an overview about software visualization and the motivation to use visualization techniques during the development of software systems. It also described visualization models to guide the design of visualization systems. The first one was the classical model proposed by Card et al. (1999). The second one was the nested four-level model proposed by Munzner (2009), which provides guidelines to design and validate the threats at each level. Although the nested model has been influenced by the models of Chi and Riedl (1998); Card et al. (1999), it provides a framework concerned with the current trends in usercentered development, where the visualization designer focus is to solve specific problems of a target public, rather than design beautiful visualizations without benefits for the users (Wijk, 2006). In addition, the proposed model by Munzner (2009) presents explicitly the threats to validity of the visualization design and suggests evaluation methods that may be applied during the whole process, facilitating the work of new visualization designers, our case in this study. Next chapter presents an overview on Software Product Lines (SPL) concepts and presents the importance of software visualization in the software products lines area.. 14.

(31) 3. An Overview on Software Product Lines. 3.1. Introduction. Software has become essential in the development of new products. Regardless of the size or the complexity of the product, nowadays, there is hardly any modern product without software. Thus, competitiveness in software development has increasingly become a concern for companies of all sizes and in all markets (Linden et al., 2007). As a result, product line engineering has rapidly emerged as an important software development approach during the last few years. Software Product Lines (SPL) were inspired in the principles of Ford’s automobile production line, which enables mass production cheaper than individual product creation (Pohl et al., 2005). The central idea is to use a common platform that can be customized, to specific customers or market segments, to create new products. In the software engineering context, the combination of mass customization, large-scale production, and the use of a common platform to derive products results in the software product line engineering paradigm (Pohl et al., 2005). According to Clements and Northrop (2001), a "software product line is a set of software-intensive systems that share a common, managed set of features satisfying the specific needs of a particular market segment or mission and that are developed from a common set of core assets in a prescribed way." This set of systems are developed from a set of core assets, which are documents, specifications, components, and other software artifacts that naturally become highly reusable during the development of each specific system in the product line. In this context, the software product line development paradigm uses a systematic and planned reuse strategy. First, the common characteristics (commonalities) and differences (variability) among products are explored, reusable parts (core assets) are developed with variability, and then, the products are created and. 15.

(32) 3.1. INTRODUCTION. customized to specific customers reusing what has been built to be reused (Pohl et al., 2005). The main benefits of developing a set of related products using the software product line paradigm are listed next (Pohl et al., 2005): • Reduction of development costs and time-to-market: initially, the financial investment and time to develop the common core assets (platform) are high. However, after this stage, the development costs and time-to-market of individual products are significantly reduced because many artifacts are reused to build each new product; • Enhancement of quality: the reuse of core assets in different products of the line ensures that they will be reviewed and tested in different contexts. This facilitates the detection and correction of faults, increasing the quality of all products; • Reduction of maintenance effort: as mentioned in the previous item, correcting a error in one of the core assets, the change can be propagated to all products that use the artifact, reducing the maintenance effort; • Simplification of evolution: the evolution of a software product line is simplified, since the inclusion of a new artifact into the platform allows the derivation of new products or addition of new features to existing products. Moreover, the change of an existing core asset allows that all products in which the core asset is being used, can be evolved as well; • Customer satisfaction: besides obtaining customized products to their needs, customers receive products with higher quality and lower price, since product line engineering helps to reduce the production costs; • Improved cost estimation: when a customer requests a new product, the organization can easily verify if the new product can be built using only the core assets that already exist or if will be necessary to develop new artifacts. Thus, the cost of a new product can be calculated easily and with minimal risks. In order to gain these advantages and adopt the software product line paradigm, some challenges should be addressed. Upfront investment, long time to develop the core assets, lack of experts, high cost of training, organizational and process change are some of these challenges (Catal, 2009). In addition, the variability management is an essential activity for the success of a product line. It is important because different products for specific customers needs are developed in terms of commonality and choices of variability (Pohl. 16.

(33) 3.2. SPL ESSENTIAL ACTIVITIES. et al., 2005). Thus, the software product line engineering paradigm requires specific development processes and activities. The remainder of this chapter is organized as follows: Section 3.2 presents the essential activities during the SPL engineering; Section 3.3 depicts the adoption models that can be employed when starting a software product line; Section 3.4 presents the characteristics of software product line scoping; Section 3.5 describes work carried out in the SPL area with the use of visualization techniques and Section 3.6 concludes this chapter with its summary.. 3.2. SPL Essential Activities. The development of software product lines combines three essential activities, as shown in Figure 3.1 (Northrop and Clements, 2007): core asset development, product development, and management. Each rotating circle represents one of the essential activities. This representation describes how the activities behave during the development process of a product line, that is, they are highly iterative, can occur in any order, and are dependent on each other.. Figure 3.1 The Three Essential Activities for Software Product Lines (Northrop and Clements, 2007). The core asset development activity does not directly aim at developing a product, but. 17.

(34) 3.2. SPL ESSENTIAL ACTIVITIES. rather aims to develop assets to be further reused in other activities. The product development activity takes advantage of existing and reusable assets to develop products. Finally, the management activity, which includes technical and organizational management, is responsible for orchestrates all activities and processes needed to make the three essential activities work together.. 3.2.1 Core Asset Development This activity is responsible for defining the commonality and the variability of the product line and developing the software artifacts (requirements, design, components, tests) which will be reused to build products (Pohl et al., 2005). Thus, the goal of the core asset development activity is to create the reusable basis (platform) of the product line. This activity can also be referred as domain engineering (Pohl et al., 2005). As illustrated in Figure 3.2, the rotating arrows suggest that the core asset development is an iterative activity. Its inputs and outputs affect each other during all the development cycle. For example, expanding the product line scope (output) may admit an evaluation of the inventory of preexisting assets (input), since that new products will be included in the product line and preexisting assets can be candidates for reuse or mining. Similarly, a production constraint (input) may lead to restrictions on the product line architecture (output) (Northrop and Clements, 2007).. Figure 3.2 Core Asset Development (Northrop and Clements, 2007). The main inputs to core asset development are (Northrop and Clements, 2007): • Product constraints: define the existing commonalities and variabilities among the products of the planned product line. Moreover, specify the behavioral features. 18.

(35) 3.2. SPL ESSENTIAL ACTIVITIES. of the products, as well as commercial, military, or company-specific standards that must be applied to the products. • Production constraints: are responsible for guiding the choice of variation mechanisms that will be used in the core assets and production plan. Thus, the component development and architecture definition is directly influenced by these constraints. • Production Strategy: it is the overall approach for realizing both the core assets and products. It defines the model that will be used to adopt the product line, as well as guides the management and evolution of the architecture and core assets. • Inventory of preexisting assets: Usually, organizations adopting product lines already have a set of legacy systems and existing products. Based on theses systems, components may be extracted, reenginered and inserted in the product line. Similarly, the product line architecture may take advantage from proven designs of existing products. According to the software product line engineering framework proposed by Pohl et al. (2005), the inputs mentioned above are used in five sub-processes of the domain engineering: product management, domain requirements engineering, domain design, domain realisation (implementation) and domain testing. As result, they produce beyond the core asset base, the product line scope and the production plan (Northrop and Clements, 2007). The product line scope describes the products that will compose the product line and those planned for the foreseeable future. The production plan describes how products are produced from the core assets.. 3.2.2 Product Development In the product development activity, the applications are built by reusing core assets and exploiting the product line variability (Pohl et al., 2005). This activity can also be referred as application engineering (Pohl et al., 2005). As in core asset development, this activity is also iterative and is illustrated in Figure 3.3. In accordance with the production plan, core assets are used to build products that meet their requirements. Beyond the product line scope, other requirements may be defined specifically for each product. Thus, the inputs required by the product development are the outputs from the core asset development (product line scope, core asset base and production plan) and requirements for specific products (Northrop and Clements,. 19.

(36) 3.3. SOFTWARE PRODUCT LINES ADOPTION MODELS. Figure 3.3 Product Development (Northrop and Clements, 2007). 2007). In addition, problems encountered with the core assets should be communicated. This feedback enables the core asset base can be corrected and evolved continuously.. 3.2.3 Management Management is extremely important to the success of a software product line. It includes two levels, technical and organizational management. Technical management is responsible for monitoring and controlling the core asset development and product development activities (Northrop and Clements, 2007). It tracks the progress of development and ensures that the activities and the processes defined are being followed. Organizational Management is responsible for defining the organizational structure that makes sense for the company. This definition of organizational structure involves both organizational and process change, and represent a barrier to the success of the product line (Catal, 2009). In addition, the organizational management should ensure that organizational units receive the right resources (for example, SPL experts and training). Thus, it is considered the main responsible for the success of failure of the product line (Northrop and Clements, 2007).. 3.3. Software Product Lines Adoption Models. In software product lines engineering, an adoption model defines the approach chosen by an organization to transition from single software development to a software product line. In order to support the organizations in this transition, Krueger (2002) proposed three. 20.

(37) 3.3. SOFTWARE PRODUCT LINES ADOPTION MODELS. adoption models, allowing an organization to select the most appropriate, depending on its objectives, budget, time and requirements as described next (Krueger, 2002): 1. Proactive: this adoption model is like the waterfall approach to conventional software development. In this case, it is recommended that the requirements for the products be well defined and stable, because the full scope of products needed on the foreseeable horizon are analyzed, designed and implemented. The proactive approach also requires a very high up front cost, time and effort; 2. Reactive: this model is an incremental approach. With the reactive approach, the product line is incremented only when there is demand for new products or new requirements on existing products. This adoption model fits organizations that cannot predict their product line requirements well, and cannot stop their production and extend their deadlines during the product line adoption; 3. Extractive: the extractive approach is appropriate when there is a set of systems that were developed individually, but have a large amount of similarities among them. Then, they can be reused and serve as basis for the product line. Existing software artifacts from one or more existing products are extracted and reengineered to serve as the core asset inputs for the product line. This level of software reuse is very high and enables an organization to very quickly adopt software products lines. Each adoption model has its associated risks and benefits. In general, the up-front cost, effort and time required by the proactive approach are adoption barriers to many organizations which plans to introduce the software product line paradigm. Thus, its adoption involves more risks to the organizations. However, the returns on investment are higher compared with the reactive and extractive adoption models. On the other hand, the reactive and extractive adoption models can eliminate the adoption barrier, reduce risks and make the adoption faster, since several organizations cannot slow down or stop production during the transition (Krueger, 2002). It is important to highlight that these adoption models are not mutually exclusive. For example, a common approach is to start a software product line using the extractive approach and then incrementally evolve the production line using the reactive approach (Krueger, 2002). Moreover, the adoption of a product line approach must be well planned and each organization has to analyze its own timetables, budget and objectives before selecting a. 21.

(38) 3.4. SOFTWARE PRODUCT LINES SCOPING. specific adoption model. The process responsible by this analyze is known as scoping and will be discussed next.. 3.4. Software Product Lines Scoping. Software product line scoping is the process responsible by defining the long-term viability of the product line. Its goal is to identify and delimit products, features, subdomains and existing assets of the product line where there is economical benefits to invest in reuse (John, 2010). In summary, it defines which products should be "in" or "out" of the product line according to the business and market goals of the organization. The scoping process occurs during the core asset development activity and produces as result the scope definition (product line scope), which is a primary output of the core asset development. Since the adoption of a product line must satisfy business goals of the organization, the scope definition is mainly based on (Northrop and Clements, 2007): • Market analysis: is a systematic analysis of the external factors that determine the potential of a product in the marketplace. It allows analyzing existing products in the market, competitors organizations, customer satisfaction and expectation. Thus, it is possible to quantify the business opportunities of a new product. • Business goals: are the goals that led the company to adopt a product line approach. For example, reducing time-to-market, reducing cost, improving quality, and others. • Technology forecasting: it is a strategic planning whose focus is on identifying trends and future technologies, both in the product line domain and software development process. It provides the basis for planning investments in research and reduces the risks associated with innovations. It is important that the scope be defined correctly. If the scope is too large, the core assets will be very complex due to the large amount of variation and hence with high development costs. On the other hand, if the scope is too small, the product line may not have enough products and customers to recover the investment in the core assets. The scoping activities are strongly based on the use of workshops and interviews, since the scope analysts need to gather information about the product line by consulting the domain experts. Domain experts are the only people with deep knowledge in a particular area or topic, in this case, it is they who know in detail the products and its features that will compose the product line. Moreover, a typical artifact produced during. 22.

(39) 3.5. VISUALIZATION IN SOFTWARE PRODUCT LINES. the scoping process is the product feature matrix, also known as product map. Its goal is to give an overview of the product line, listing its features and their distribution in the products (John et al., 2006).. 3.5. Visualization in Software Product Lines. The differential of product lines is the way that the differences and similarities between the products are addressed in the development process. Therefore, the variability management is one of the most important activities in the SPL process. Furthermore, real product lines (industrial size product lines) can easily incorporate thousands of variation points and configuration parameters for product customization (Nestor et al., 2007). This huge amount of variability generates, consequently, a large amount of artifacts (core assets), and makes the variability management an extremely complex process, which requires sophisticated techniques to support it. As a possible solution, the work of Nestor et al. (2007) suggests the use of software visualization techniques to support the variability management in software product lines, as well as in others activities during the development process of the product lines. In addition, since 2007, some effort is being developed by the SPL Community in this direction, as the creation of international workshops with focus on the application of visualization techniques in software product lines (Thiel et al., 2007, 2008, 2010). In this context, several studies have been conducted seeking to apply visualization techniques to the different development phases of a product line. Trinidad et al. (2008) proposed the Feature Cone Trees (FCT), a new approach for visualizing large feature diagrams. It tries to solve the problem of large feature diagrams visualization and manipulation. For this purpose, the FCT is based on cone trees (Robertson et al., 1991), a visualization technique to represent large hierarchies in the three-dimensional space. Heidenreich et al. (2008) was focused on facilitating the understanding of the realisation models, similar representations of the UML model, but with characteristics directed to product lines. Their visualization technique, MappingViews, makes the relationship between the feature model and the realisation models, allowing to explore how a particular feature is realised or which artifacts may be used in a variant. Kastner et al. (2008) proposed a solution to the problem of feature traceability from the domain level (feature model) to the implementation (source code). Through a direct mapping between features and their implementation, developers are able to explore virtual. 23.

(40) 3.5. VISUALIZATION IN SOFTWARE PRODUCT LINES. views on the source code, depending of the selected features. Thus, the development and maintenance are facilitated. Product derivation is the process responsible for the creation of a product based on the core assets and the variation mechanisms defined in the product line architecture. In summary, a product specification should be transformed into a product using the core assets. Usually, this specification is composed of a large number of features, as well as dependencies between them that need to be addressed. Thus, this process may be very complex and error-prone without adequate tool support. In this context, Rabiser et al. (2007) and Rabiser (2008) support the product derivation using flexible visualizations according to the user. Since different stakeholders are involved in the product derivation and have to understand different aspects of the provided variability, then three perspectives were developed: expert, advanced and simple perspective. They highlight that text-based views are preferred by sales people. However, developers prefer graphical views. In the same line of research and with same goal, Botterweck et al. (2007, 2008) and Nestor et al. (2008) presented a tool proposal which maps and integrates the decision model, the feature model and the components of a product line to support the product derivation. They also highlighted that several decisions during the tool development were based on visualization principles, which became the user experience more intuitive. Sellier and Mannion (2007) used visualization techniques to support the product derivation and the selection of requirements based on a decision model, considering the existing dependencies between the requirements, since that software product lines may generate a large number of variation points, and consequently, a large number of inter-dependencies between them. Finally, Duszynski (2010) proposed a technique for organizing and visualizing variability information, specially focused in the reverse engineering context. The technique is generic and can be applied to source code, models, and other types of product line artifacts. Based on the studies presented previously, it is possible to highlight the large number of studies with focus on variability management and product derivation. This can be explained by the importance of this process to the success of a software product line. Regarding to the first point, the variability management is responsible for defining relationships between artifacts and controlling the variability of the product line. Moreover, variability is the differential factor of the product line paradigm, playing a key role during the whole development process of a product. Regarding the second one, the derivation. 24.

(41) 3.6. CHAPTER SUMMARY. of products is the process by which a SPL may take advantage of its characteristics in relation to single-system development. For this reason, several proposals try to optimize this process in pursuit of greater gains for the organizations. Another important issue is that among the studies identified, only the work of Duszynski (2010) address the extractive adoption model, which is based on identifying and reengineering of artifacts extracted from existing products of an organization. Compared to the reactive and proactive approaches, the extractive model enables a quicker transition to the product line paradigm. Furthermore, this model is typically used in practice, as are described in success cases of big companies, such as, Bosch Gasoline Systems, Nokia Mobile Phones, Nokia Networks and Philips Medical Systems (Linden et al., 2007).. 3.6. Chapter Summary. This chapter presented an overview of software product lines, discussing the motivations to adopt the software product line engineering paradigm and its peculiarities, which require specific processes and activities to support the software development process. Then, the essential activities that guide the development of a product line and common adoption models used to start a SPL were also presented. Moreover, an overview of scoping in software product lines was discussed. Finally, studies using visualization applied to software product lines were identified and briefly described, revealing trends and research gaps. Based on the identified gaps, next chapter presents an empirical study performed in an industrial project to understand the problems that product line engineers face in SPL scoping using the extractive adoption model.. 25.

(42) 4. A Focus Group Study to Understand Software Product Lines Scoping in Industrial Setting. 4.1 Introduction As described in the previous chapter, in the scoping process, product line engineers elicit scoping information from the available documentation of existing products and based on knowledge from domain experts. This is an effort-intensive task since much time is invested in workshops and interviews with the domain and systems experts. Moreover, often, the domain experts do not have time to share their knowledge and the documentation of existing products is inexistent or outdated. Although the product line scoping approaches are based on existing documentation and workshops with domain experts, we believe that in an SPL adoption using the extractive approach, the source code of existing products can be also an important asset to extract information about variability and commonality. One advantage of source code analysis is that it can be performed independently from the domain experts, which will be necessary only to validate the results of analysis. Moreover, unlike the documentation that can be outdated, the source code represents the current state of a product. However, analyzing the source code of several products, without proper tool support, is a more complex task than eliciting information from documentation and domain experts. Thus, in order to reduce costs and time in the scoping process, we believe that a tool based on software visualization techniques can be used to support and guide the product line engineers to identify the existing commonality and variability in the source code of. 26.

(43) 4.2. RESEARCH DESIGN. legacy products. Thus, in order to guide the tool development, an empirical study was conducted to understand the problems that product line engineers face in SPL scoping in an industrial project and elicit requirements to develop the new tool to support the scoping process based on an extractive approach. The remainder of this chapter is organized as follows: Section 4.2 discusses the research methodology; Section 4.3 presents the results from the data analysis performed; In Section 4.4 are presented the lessons learned and limitations of the study, and, finally, the summary is described in Section 4.5.. 4.2. Research Design. The empirical study was conducted using the focus group research method (Kontio et al., 2004), which are planned discussions that may guide the product development, used by researchers to gain understanding about an issue through the eyes and hearts of the target audience (Krueger and Casey, 2009). In this study, we considered that a focus group with a small team could provide insights and more information about the scoping process than using interviews. Our assumption is that the focus group presents a more natural environment than an individual interview because participants are influencing and are influenced by other, which facilitates the brainstorm of ideas. According to Krueger and Casey (2009), the main steps involved in the focus group research method are: define the problem, develop questions, select the participants, conduct the focus group session and analyze the results. The focus group conducted in this research follows this process and the next subsections describe how each step was addressed.. 4.2.1 Define the Problem The first step in the focus group research is to determine the purpose of the study. It is important to know exactly what is expected from the focus group. Thus, the goal of this study was to understand the problems that product line engineers face in SPL scoping in an industrial project, specifically in SPL adoption using the extractive approach. It also aims to elicit requirements to develop a new tool, based on source code of existing products, to support the scoping process. Therefore, the research questions that motivated this study were: • What are the industrial problems that product line engineers face in SPL scoping,. 27.

(44) 4.2. RESEARCH DESIGN. using the extractive approach? • What are the most important features that a tool should have to support the SPL scoping, based on source code of existing products?. 4.2.2 Questions Development The questions are responsible for guiding the discussion of a focus group. According to Krueger (1998b), two different questioning strategies can be adopted in focus groups. The topic guide is based on a list of topic or issues, which remind the moderator of the topic of interest. By contrast, questioning route is a sequence of questions in complete, conversational sentences. The questioning route was choose for the focus group session, since it can produce more efficient analysis and minimizes subtle differences in questions that could alter the intent. Moreover, the topic guide should be avoided by beginning moderators, our case in this research. A draft of the questions was developed initially based on other focus group studies identified (Kontio et al., 2004) and in the literature on software product lines scoping (Moraes et al., 2009; John and Eisenbarth, 2009). This initial version was revised twice until the final version. Thus, eight questions were developed to the focus group session and are presented in Table 4.1. Number 1 2 3 4 5 6 7 8. Question What is your name and which role(s) are you acting in the project? What kind of activities did you perform in the project? Regarding the activities that you performed, what were the major difficulties found? Based on the activities that you performed, did you have need to visualize the source code? Why? In what activities do you believe that would be more relevant to analyze the source code? What information do you consider important and possible to be extracted from source code analysis? Do you believe that a tool to support the source code analysis from existing products would be interesting? If yes, what features would you consider essential? Is there something that you consider important, but we did not talk before? Table 4.1 Focus group questions. 28.