• Nenhum resultado encontrado

Evowave: A Multiple Domain Metaphor for Software Evolution Visualization

N/A
N/A
Protected

Academic year: 2021

Share "Evowave: A Multiple Domain Metaphor for Software Evolution Visualization"

Copied!
89
0
0

Texto

(1)

Universidade Federal da Bahia

Universidade Estadual de Feira de Santana

DISSERTAC

¸ ˜

AO DE MESTRADO

EVOWAVE: A multiple domain metaphor for software evolution

visualization

Rodrigo Chaves Magnavita

Mestrado em Ciˆ

encia da Computa¸

ao – MMCC

Salvador

(2)
(3)

RODRIGO CHAVES MAGNAVITA

EVOWAVE: A MULTIPLE DOMAIN METAPHOR FOR

SOFTWARE EVOLUTION VISUALIZATION

Disserta¸c˜

ao apresentada ao Mestrado

em Ciˆ

encia da Computa¸c˜

ao da

Uni-versidade Federal da Bahia e

Univer-sidade Estadual de Feira de Santana,

como requisito parcial para obten¸c˜

ao

do grau de Mestre em Ciˆ

encia da

Computa¸c˜

ao.

Orientador: Manoel Gomes de Mendon¸ca Neto

Co-orientador: Renato Lima Novais

Salvador

(4)

Evowave: a multiple domain metaphor for software evolution  visualization/ Rodrigo Chaves Magnavita. – Salvador, 2016. 89 f. : il. color. Orientador: Prof. Manoel Gomes de Mendonça Neto Co­orientador: Prof. Renato Lima Novais. Dissertação (Mestrado) – Universidade Federal da Bahia.   Instituto de Matemática, 2016.

1.  Engenharia  de  Software.  2.  Evowave.  3.  Evolução  de  software.  I.  Mendonça  Neto,  Manoel  Gomes  de.  II.  Novais,  Renato Lima. III. Universidade Federal da Bahia. IV. Título.

(5)

iii

TERMO DE APROVAC

¸ ˜

AO

RODRIGO CHAVES MAGNAVITA

EVOWAVE: A MULTIPLE DOMAIN

METAPHOR FOR SOFTWARE EVOLUTION

VISUALIZATION

Esta disserta¸c˜ao foi julgada adequada `a obten¸c˜ao do t´ıtulo de Mestre em Ciˆencia da Computa¸c˜ao e aprovada em sua forma final pelo Mestrado em Ciˆencia da Computa¸c˜ao da UFBA-UEFS.

Salvador, 15 de Janeiro de 2016

Prof. Dr. Manoel Gomes de Mendon¸ca Neto Universidade Federal da Bahia

Prof. Dr. Rodrigo Rocha Gomes e Souza Universidade Federal da Bahia Prof. Dr. Rodrigo Oliveira Sp´ınola

(6)
(7)

RESUMO

A evolu¸c˜ao do software produz uma grande quantidade de dados durante os ciclos de desenvolvimento. Engenheiros de software precisam interpretar esses dados para extrair informa¸c˜oes que os auxiliar˜ao na execu¸c˜ao de suas atividades di´arias. O uso de Visual-iza¸c˜ao de Evolu¸c˜ao de Software (VES) tem sido uma abordagem promissora para auxiliar nessa interpreta¸c˜ao. Essa abordagem faz uso de recursos visuais que facilitam a inter-preta¸c˜ao desses dados. Ainda assim, n˜ao ´e trivial representar visualmente todos os dados gerados durante a evolu¸c˜ao do software, pois al´em do software possuir diferentes entidades e atributos, ainda ´e necess´ario lidar com a dimens˜ao temporal da evolu¸c˜ao.

As VES geralmente s˜ao constru´ıdas com objetivo de auxiliar na execu¸c˜ao de atividades relacionadas a um dom´ınio espec´ıfico da engenharia de software. Muitas dessas visual-iza¸c˜oes focam apenas em apresentar uma vis˜ao geral da evolu¸c˜ao do software, sem focar nos detalhes. Entretanto, a maioria das atividades de desenvolvimento de software requer tanto combinar diferentes dom´ınios quanto ter uma vis˜ao detalhada das informa¸c˜oes. As met´aforas visuais (i.e., conceitos, associa¸c˜oes e analogias a entidades concretas) utilizadas nessas visualiza¸c˜oes, s˜ao muito espec´ıficas, objetivando auxiliar apenas um determinado dom´ınio. O uso de m´ultiplas vis˜oes do software para construir o modelo mental do sis-tema vem sendo apontado como uma abordagem efetiva para o completo entendimento do mesmo. Na maioria dos casos, essas visualiza¸c˜oes possuem conjuntos de met´aforas vi-suais. Devido a isso, surge uma necessidade do engenheiro de software compreender e se familiarizar com as met´aforas visuais de cada uma das visualiza¸c˜oes durante a utiliza¸c˜ao das mesmas. Uma das formas de mitigar esse problema ´e usar visualiza¸c˜oes que possuem uma ´unica met´afora visual para visualizar diversos aspectos e perspectivas do software.

Esta disserta¸c˜ao apresenta uma nova met´afora visual, chamada EVOWAVE, capaz de ser utilizada em m´ultiplos dom´ınios e que permite visualizar os dados de forma global e detalhada. A EVOWAVE ´e inspirada em ondas concˆentricas observadas de cima. Essa met´afora consegue representar grandes quantidades de dados e seus con-ceitos s˜ao transversais a dom´ınios na ´area de engenharia de software. O desenvolvimento desta met´afora passou por fases iterativas que refinaram os conceitos associados a ela. Primeiramente foi desenvolvido um prot´otipo que validou a capacidade da met´afora de representar grandes quantidades de dados. Em seguida, foram realizados estudos para validar a capacidade de representar dados de diferentes dom´ınios. Os resultados indicam que a met´afora proposta pode ser utilizada de forma efetiva em diferentes dom´ınios da ´

area de engenharia de software para auxiliar na execu¸c˜ao de atividades de manuten¸c˜ao e evolu¸c˜ao.

Palavras-chave: Visualiza¸c˜ao de Software, Evolu¸c˜ao de Software, Compreens˜ao de Software, Engenharia de Software

(8)
(9)

ABSTRACT

The software evolution produces a lot of data during software development. Software engineers need to interpret these data to extract information that will help them in carrying out their daily activities. The use of Software Evolution Visualization (SEV) has been a promising approach to support this interpretation. This approach makes use of visual attributes that facilitate the interpretation of such data. Still, it is not trivial to visually represent all the data generated during the software development because software have different entities and attributes. Nevertheless, it is still necessary to deal with the temporal dimension of evolution.

The SEV are usually built with the goal of helping activities related to a specific domain of software engineering. Many of these visualizations focus only on presenting an overview of the software development, without focusing on the details. However, most software development activities requires both: combine different domains and detailed information. Visual metaphors (i.e., concepts, associations and analogies to specific enti-ties) used in these visualizations are very specific, aiming to assist only a certain domain. The use of multiple visualizations of the software to build the mental model of the system has been touted as an effective approach for the complete understanding of it. In most cases, these visualizations have a sets of visual metaphors. Because of this, the software engineer need to understand and become familiar with the visual metaphors of each of the visualizations while using them. One way to mitigate this problem is to use visualization that have a unique visual metaphor to view various aspects and perspectives of software. This work presents a new visual metaphor, called EVOWAVE, able to be used in mul-tiple domains and to visualize the data in a comprehensive and detailed way. EVOWAVE is inspired by concentric waves as seen from above. This metaphor can represent large amounts of data and concepts cut across domains in the software engineering field. The development of this metaphor went through iterative phases that have refined the con-cepts associated with it. First we developed a prototype that has validated the ability of metaphor to represent large amounts of data. Then, studies were performed to validate the ability to represent information in different domains. The results indicate that the proposed metaphor can be used effectively in different domains of software engineering to assist in the execution of maintenance and development activities.

Keywords: Software Visualization, Software Evolution, Software Comprehension, Soft-ware Engineering

(10)
(11)

CONTENTS

Chapter 1—Introduction 1 1.1 Context . . . 1 1.2 Motivation . . . 2 1.3 Problem Statement . . . 4 1.4 Goal . . . 4 1.5 Approach . . . 5

1.6 Structure of this Document . . . 6

Chapter 2—Literature Review 7 2.1 Software Evolution . . . 7

2.2 Software Comprehension . . . 8

2.3 Representing Data . . . 9

2.4 Software Visualization . . . 13

2.5 Software Evolution Visualization . . . 19

2.6 Chapter Conclusion . . . 27

Chapter 3—The EVOWAVE Metaphor 29 3.1 Concentric Wave Facts . . . 29

3.1.1 The concentric wave propagation occurs in all directions . . . 29

3.1.2 The biggest waves have more molecules . . . 30

3.1.3 The wave closest to the center is the last formed . . . 30

3.2 EVOWAVE Concepts . . . 31

3.2.1 Layout . . . 31

3.2.2 Windows . . . 31

3.2.3 Molecules . . . 32

3.2.4 Sectors . . . 32

3.2.5 Number of Molecules Indicator . . . 32

3.3 Mapping software properties . . . 33

3.3.1 Timeline . . . 33

3.3.2 The Pooler of a Sector . . . 33

3.3.3 The Splitter of a Sector . . . 33

3.3.4 The Angle of a Sector . . . 34

3.3.5 The Color of a Molecule . . . 35

3.4 Tool Implementation . . . 36

3.4.1 Metadata . . . 36 ix

(12)

3.4.2 The EVOWAVE Visualization . . . 37

3.5 Chapter Conclusion . . . 42

Chapter 4—Validation 43 4.1 An Exploratory Study on Software Collaboration . . . 43

4.1.1 Study Settings . . . 44

4.1.1.1 Setup . . . 44

4.1.1.2 Tasks . . . 44

4.1.2 Study Execution . . . 45

Who is working on what? . . . 45

How much work have people done? . . . 46

What classes have been changed? . . . 47

Who has the knowledge to do the code review? . . . 48

Who is working on the same classes as I am and for which work item? . . 49

4.1.3 Conclusion . . . 50

4.2 An Exploratory Study on Library Dependency Domain . . . 50

4.2.1 Study Setting . . . 51

4.2.1.1 Setup . . . 51

4.2.1.2 Tasks . . . 52

4.2.2 Study Execution . . . 53

Understand the regularity of system dependency changes . . . 53

Understand what important structural dependency events have occurred 54 Discover the current “attractiveness” of any library version . . . 56

Discover if newer releases are viable candidates for updating . . . 57

4.2.3 Conclusion . . . 57

4.3 An Exploratory Study on Logical Coupling Domain . . . 57

4.3.1 Study Settings . . . 57 4.3.1.1 Setup . . . 58 4.3.1.2 Tasks . . . 58 4.3.2 Study Execution . . . 58 4.3.3 Conclusion . . . 64 4.4 Limitations . . . 64 4.5 Chapter Conclusion . . . 65 Chapter 5—Conclusion 67 5.1 Contributions . . . 67 5.2 Limitations . . . 68 5.3 Future Works . . . 68

(13)

LIST OF FIGURES

1.1 Two software data visualizations embedded in Eclipse IDE. . . 3

1.2 The work performed in this research. . . 5

2.1 The march of Napoleon’s army into Moscow at the Russian Campaign of 1912 (Encyclopaedia-britannica-online, 2013). . . 10

2.2 Facebook data about friendship connectivity around the world (Facebook, 2013). . . 11

2.3 Visualization of a developing tornado displayed using thousands of circu-lating particles (Ncsa, 2013). . . 12

2.4 FishEye menu showing 100 web sites at the same time. Adapted from (Bederson, 2000). . . 13

2.5 The principles of a polymetric view (Lanza, 2004). . . 14

2.6 Inter-class view displaying a call-graph with two classes expanded (Staples; Bieman, 1999). . . 15

2.7 An overview of the city of ArgoUML v.0.24 (Wettel; Lanza, 2008). . . 16

2.8 Full view of the entire Cromod system trace (Cornelissen et al., 2007). . . . 17

2.9 A schematic view of the 3D visualization (Greevy; Lanza; Wysseier, 2006). . 18

2.10 All four views used to identify code smells. Adapted from (Carneiro et al., 2010). . . 19

2.11 SeeSoft showing the code age (Ball; Eick, 1996). . . 20

2.12 The Evolution Matrix and some characteristics about it (Lanza, 2001). . . 21

2.13 The JUnit framework visualized as a city using SkyscrapAR (Souza rodrigo; Manoel, 2012). . . 22

2.14 A taxonomy of visualization techniques for dynamic graphs. (Rufiange; Melancon, 2014). . . 23

2.15 Staged animations used in the matrix-based visualization can show changes concerning multiple types of nodes and edges. (Rufiange; Melancon, 2014). . 24

2.16 Principles of the Evolution Radar (D’ambros; Lanza; Lungu, 2009). . . 24

2.17 SDP for FINDBUGS system. (Kula et al., 2014). . . 26

2.18 Simplified example of a LDP with a single system for the COMMONS-LANG library (Kula et al., 2014). . . 27

3.1 A snapshot of real concentric waves. . . 30

3.2 The EVOWAVE concepts . . . 31

3.3 The pooler is the authors and the events are any changed files . . . 34 xi

(14)

3.4 The sectors are the java packages. At (A) no splitter was used and all packages are presented at once and at (B) the dot was used as splitter and

only one level of the java package hierarchy are displayed at once . . . . 35

3.5 The EVOWAVE Tool Architecture . . . 36

3.6 The structure of the EVOWAVE metadata . . . 37

4.1 EVOWAVE showing who is working on such package . . . 46

4.2 EVOWAVE showing how much work has people done . . . 47

4.3 EVOWAVE showing what classes have been changed . . . 48

4.4 EVOWAVE showing who has the knowledge to do the code review . . . . 49

4.5 EVOWAVE showing who is working on the same classes as Kazutoshi Satoda and in what java files . . . 50

4.6 Examples of EVOWAVE visualizations for the Library Dependency Do-main. The visualization (A) is showing the use of all dependencies and in (B) the use of all dependencies by the FindBugs system. . . 52

4.7 The dependency usage of FindBugs project . . . 54

4.8 Two EVOWAVE visualizations for two dependencies with the usages by the FindBugs system. . . 55

4.9 Two EVOWAVE visualizations of the “junit.junit” dependency usage. . . 56

4.10 EVOWAVE visualization with all module changes during fifteen years of development . . . 59

4.11 EVOWAVE visualization with all module changes from 2003 to 2005, fo-cused on the “uml” package. . . 60

4.12 EVOWAVE visualization with all changes separed by modules from 2003 to 2005 with the “uml” package in focus. . . 61

4.13 EVOWAVE visualization to analyze the logical coupling between the “uml” and “model” packages. . . 62

4.14 EVOWAVE visualization from January 2, 2010 to February 8, 2013 . . . 63

4.15 Table comparing the tasks performed by each work used in the studies and EVOWAVE. . . 65

(15)

Chapter

1

This chapter presents a overview about the concepts used in this work. Its main goal is to describe the context, motivation and problem related to this dissertation as well as the goals and contributions describing the approach to achieve them.

INTRODUCTION

1.1 CONTEXT

Software evolution has been highlighted as one of the most important topics in software engineering (Novais et al., 2013). It has very complex activities because the software de-velopment process generates a huge amount of data and has many stakeholders. Dealing with these scenarios is challenging because developers may use over 60% of the mainte-nance effort understanding the software (Corbi, 1989). Since the maintenance effort uses up to 90% of the software budget (Erlikh, 2000), the time spent to understand the software in order to perform a maintenance activity uses up 54% of the budget.

This leads the software engineering research community to create methods, processes and techniques to improve software comprehension. The goal is to increase the overall effectiveness of software development by making the software more comprehensive. This represents a challenge because as the software evolves it becomes more complex due to the insertion of more features, the involvement of new developers and other factors. The more complex the software the more difficult it is to extract information about it, therefore, the harder it becomes to comprehend it.

Software visualization has been used as one way to deal with software comprehension activities. Software visualization usage is currently increasing. It helps people to under-stand software through visual elements, reducing complexity to analyze the amount of data generated during the software evolution (Diehl, 2007). Some examples of what this data can be are: software metrics, stakeholders, bugs, features.

This data increases faster when we are dealing with software evolution. Every day a series of events occur during the software development process (e.g. changes in the source code, mails exchanged within the project team, changes in the team, the emergence of new technologies). These events generate a huge amount of data rendering the handling all these data a difficult task to understand the software. That is why software visualization

(16)

is a good approach to organize these data and help extract information about the software. Nevertheless, building visual metaphors that effectively represent the time dimension with all the data related to software evolution is a challenge in the field of software evolution. 1.2 MOTIVATION

There has been much effort by the academy and industry to visually organize the software data. Eclipse IDE (Integrated Development Environment), one of the most used in the academic and industrial setting, has several visualizations called views. They are used for different tasks (e.g. debug, aspect configuration, bug correction). Each view organizes its data in a specific way.

Figure 1.11 illustrates two of those views highly used during software development

tasks: Enterprise Explorer (A and B) and Type Hierarchy (C). Enterprise Explorer is used to visualize the elements of a Java Enterprise Web Project. In our example, on (A), we can see the main structures regarding the Web Project named “Project Example”. On (B), it is possible to visualize the details of this Web Project, for example, which Servlets or Filters it has registered. Type Hierarchy is used to visualize the hierarchy between Java elements. In this case, it integrates the overview information with the details in the same view. On the left it is possible to identify which classes implements the interface List (overview information) and, on the right, which methods are specific of the ArrayList class, and which methods came from some element up in the hierarchy (detailed information). As in Eclipse IDE, software visualizations follow the strategy of only showing an overview picture of the system (overview strategy), only the details (detail strategy), or both.

The overview strategy gives more transverse information about the task to be per-formed. The amount of data visualized is reduced because only high level data is con-sidered. The detail strategy gives a lot of low level details about some aspect of the software. The amount of data visualized is increased when compared to the overview strategy because of all the low level data considered. According to Shneiderman ( Shnei-derman, 1996), in his visualization mantra: overview first, then details on demand, the mix of those two strategies are the ideal. The overview picture is considered to identify, in a high level abstraction, which part of the software needs to be thoroughly analyzed. In the context of software evolution visualization, authors have taken different approaches. Some present the big picture of the software, providing an overview of the entire software history (Kuhn et al., 2010)(Voinea; Telea, 2006)(Lungu, 2008), while others show snapshots of the software evolution in detail (Abramson; Sosic, 1995)(Novais et al., 2011)(Novais et al., 2012)(Bergel et al., 2011)(D’ambros; Lanza; Lungu, 2009). They are both important because each approach fits better to specific software evolution tasks. An important issue in the area is to understand how to combine both approaches in a practical and useful way so that users can really take advantage of the visualizations proposed (Novais et al., 2013).

A mapping study performed in the area (Novais et al., 2013) highlighted other issues for the software evolution visualization community. Many works address software evolution

1Since this is a software visualization work, and to improve understanding, it is important to see this

(17)

1.2 MOTIVATION 3

Figure 1.1 Two software data visualizations embedded in Eclipse IDE.

by viewing only one type of data (e.g. source code change, defects, features). They do not usually display or cross-reference different information that can be recovered from different sources. Again, they are able to help users to perform few or specific tasks.

A visual metaphor is a representation of a person, place, thing, or idea by way of a visual image that suggests a particular association or point of similarity. The use of multiple metaphors for different tasks can increase the comprehension time. For each one of those, software engineers need to learn how it works before actually being able to extract valuable information for the task. It is difficult to create a metaphor that helps in all software engineering tasks, because it would display too much data at the same time. The use of filters to reduce the amount of data displayed is a common technique

(18)

used by the views.

These issues motivated this work to develop a metaphor that can represent both the overview and the details of software evolution for different domains using different data sources.

1.3 PROBLEM STATEMENT

One of the main problems in software evolution visualization area is the amount of data that needs to be analyzed. Besides the normal complexity to visualize several software elements of one version, the visualization still needs to handle the time component, in-creasing the amount of data. Time introduces more information when we analyze each version of the software at the same time. Visualization must portray both the data about the software in each version, and the data between the versions.

These data are storage in different software repositories (e.g., source code reposito-ries, bug repositoreposito-ries, and mail lists) with different semantics (e.g., metrics, classes, and bugs) to help different tasks (e.g., comprehension, refactoring, and library dependency upgrades). Given the diversity of this data, the majority of works in the area focus in the creation of metaphors to solve specific problems. They solve this problem by the analysis of some data semantics extracted from few sources. Unfortunately, this approach created a huge number of metaphors in the field (Novais et al., 2013).

Software engineers usually need to perform different tasks in their daily work. For each of those tasks, and following the aforementioned approach, they will need to use a different metaphor with different concepts, methodologies, and tools to address the task at hand. The problem is that this may require a considerable time simply learning or adapting his/her mind to the specific metaphor.

To develop a metaphor that encompass same concepts, methodologies and tools aiming to help software engineers performing different tasks from different domains (e.g., software collaboration, software architecture, and library dependency) is not an easy achievement. The metaphor needs to be generic enough to be able to represent different data and also be able to show detailed information.

1.4 GOAL

In this work, we aim to propose a new software evolution visualization metaphor called EVOWAVE. It is able to visualize different types of data generated in the development process using both overview and detail approaches. EVOWAVE can represent a huge number of events which occurred during the software development in a glance. It can be applied to different software engineering tasks and domains.

We can point out the following specific goals:

• To idealize and specify a metaphor that can represent different types of data from different domains from the overview to the details.

• Develop a tool with the metaphor implemented.

(19)

1.5 APPROACH 5

• Perform experimental studies to validate the use of the metaphor in multiple do-mains.

1.5 APPROACH

Figure 1.2 presents an overview of the work performed in this research, activity by activity in chronological order. The white rectangle with orange borders represents mandatory classes of the masters program. The green rectangles represent theoretical activities. The blue rectangles represent development activities. The blue circles with the letter P inside indicates a publication accomplished. The white circles with the letter P inside indicates a publication that will be submitted. The blue arrows link the publication to the items related to it.

Figure 1.2 The work performed in this research.

We start this work through informal study on material given in the Software Evolu-tion class. In this moment, a set of material regarding to Software EvoluEvolu-tion, Software Visualization, Software Visualization Evolution was read. Later, the systematic mapping review by Novais (Novais et al., 2013) was used to find gaps in the area. During this activity we observed the huge number of metaphors proposed in the area, each one addressing a specific task. At this point we start to explore this gap analyzing why the proposed metaphors can be only used to the proposed context. During this analysis we identify that the main problem was not in the layout or in how the information was displayed, but the fact that concepts introduced were too specific for a domain.

After that, we started trying to identify natural phenomenons and common known analogies (i.e. possible metaphors) that could be applied to our problem. We based in analogies already used and others that were not used yet. The first analogy that we thought was the cities (Wettel; Lanza, 2008). The problem is that, as in the cities, they grow too fast and create a complex visualization to handle. The second analogy was

(20)

with the concentric waves. We thought in this phenomenon because in has the temporal component associated to it (the wave propagation) and uses a radial layout used in some metaphors already existed (e.g. a reference in the field: evolution radar (D’ambros; Lanza; Lungu, 2009)). The problem with the evolution radar was the concepts associated to the metaphor that makes it too specific to the logical coupling domain. We started to explore the concentric waves phenomenon and tested the idea in several domains.

With good results we presented our qualification project based on them and started to work in the prototype to perform a exploratory study in a real open source project. The main goal of the prototype was to better study this phenomenon in our context. Then, we were able to create generic concepts to be able to represent many domains. During the development of the prototype the metaphor concepts was being improved. In the last months of this process, we started to perform a experimental study (Study 1) on the context of software contribution.

This study answered some of the common questions asked by developers during the development process (Fritz; Murphy, 2010). We had interesting results, since we were able

to answer all the question that we had data available. This study resulted in a full paper published at ICEIS with the EVOWAVE concepts (Magnavita; Novais; Mendon¸ca, 2015).

During the publication, the prototype was improved to a tool generic enough to display any data that satisfies the metadata specification. A full paper, addressing this novelty, is programed to be submitted in the VISSOFT 2016.

To validate the use of the metaphor in others domains, we performed the studies 2 and 3. The Study 2 is in the library dependency context with tasks related to the comprehension of the library usage and help in the decisions such as the upgrade of some library. The Study 3 did a retrospective analysis using the logical coupling between modules.

1.6 STRUCTURE OF THIS DOCUMENT

The structure of this document was defined in order to provide a easy understanding of this work. Therefore, this dissertation is divided in the following chapters:

• CHAPTER 1 - describes the motivation, goal and scope of this research, as also the structure of this document.

• CHAPTER 2 - presents the literature review used in this dissertation.

• CHAPTER 3 - describes the metaphor concepts in detail and how it was developed. • CHAPTER 4 - validates the metaphor through exploratory studies for different

domains.

(21)

Chapter

2

This chapter discusses the main topics and concepts related to this work. Initially, the concepts of software evolution are presented emphasizing that systems need to evolve to stay alive. Afterwards, we discuss how software evolution impacts on the system complexity and introduce some approaches to help to understand those system. Then, we discuss how visualizations can be used in different scenarios for many purposes, highlighting their use to analyze the software and its evolution.

LITERATURE REVIEW

2.1 SOFTWARE EVOLUTION

Software needs to evolve in order to stay alive (Lehman, 1980). However, before per-forming any effective maintenance, it is crucial to understand the system (Mayrhauser; Vans, 1995). Software comprehension field provides mechanisms to achieve knowledge about all software aspects (e.g., features, structure, and behaviour). The importance of properly understanding how the software works and evolves is transversal to all software development phases. Software engineers spend more time understanding the software than actually performing software engineering tasks (Corbi, 1989)(Pigoski, 1996). This fact enhances the importance to provide more efficient comprehension techniques.

The growth in software complexity occurs during all its evolution (Lehman; Ramil, 2001)(Jay et al., 2009). Accordingly, software comprehension decreases (Caserta; Zendra, 2010). To prevent this problem, some approaches were developed in the field. For exam-ple, design patterns (Gamma et al., 1995) can be used to create well known solutions to certain common problems. This approach will lead to a better understanding of the code because the developers will be using common solutions. However, this approach is still an issue. Programmers do not know or simply do not use those solutions if they are under the pressure of time. Others approaches were proposed in software re-engineering (Briand, 2006). Unfortunately, the effectiveness of those approaches still needs to be improved.

Until the start of the last decade, software maintenance tasks had their participation uniform during the software development process. In 2000, a new model of a software development process was proposed because it had changed drastically since early days (Rajlich; Bennett, 2000). This model, called Staged Model, defines five stages which repre-sent the software life cycle:

(22)

• Initial Development: During this phase, an early version of the system is built by the team experts and the system architects. The documentation represents the system and should be built with numerous well known tools and methods.

• Evolution: Engineers extend the capabilities and functionality of the system to meet user’s needs, possibly in major ways.

• Servicing: Engineers repair minor defect and make simple functional changes. • Phaseout: The company decides not to undertake any more servicing, seeking to

generate revenue from the system for as long as possible.

• Closedown: The company withdraws the system from the market and directs users to a replacement system, if one exists.

Each one of those states is composed by activities. The elicitation of requirements, en-coding and tests are examples of such activities. They differ depending on companies and the processes they use. Each activity generates different software artifacts (e.g., source code, requirements and architectural documents, test-cases). Most of these artifacts are textual data which are hard to be analyzed.

Other issue is related to high level artifacts. The software are not always updated dur-ing its evolution (Lethbridge; Singer; Forward, 2003). In other words, while software source code evolves, its documentation may not. As a consequence, current team developers are the ones who keep the knowledge of the software. Unfortunately, team developers may change frequently. This leads to a major problem: new developers have a hard task on the software understanding. This is much harder when considered the large amount of data to be analyzed as the software evolves.

2.2 SOFTWARE COMPREHENSION

Software knowledge can be divided into two types: independent and specific software application knowledge (Mayrhauser; Vans, 1995). The first one is acquired by participat-ing in different software projects. Developers holds generic knowledge about software development such as design patterns and algorithms. The second one is acquired during the understanding process when working on a specific software project. It holds specific knowledge about the software business rules under development.

Various theories were proposed to explain how developers achieve knowledge about a software (Shneiderman; Mayer, 1979)(Letovsky, 1986)(Brooks, 1983)(Soloway; Adelson; Ehrlich, 1988)(Pennington, 1987). They can be classified within two categories: bottom-up, top-down. The bottom-up approach suggests that software knowledge is constructed from low levels of abstraction (e.g. source code) and as the programmer understands the code he creates a more abstract, high level mental model of the software. The top-down approach describes that the metal model is firstly created by understanding the domain of the software and its high-level artifacts (e.g., requirements, architectural documentation). Then, the programmer gets deeper into software details such as languages, technologies, and source code data.

(23)

2.3 REPRESENTING DATA 9

The most reliable source of knowledge is the source code, since the high-level abstrac-tion of a software may not be available in most projects due to process flaws and obsolete documentation (Lethbridge; Singer; Forward, 2003)(Deursen et al., 2004). Unfortunately, the

code is the most primitive source of knowledge for most software engineering tasks. It is easier to have techniques and tools for tasks that are intrinsic to syntax or low level semantics (e.g. searching for code anomalies to do refactoring) because they are gener-ally available belonging to the projects. More complex tasks such as the identification of concerns (Brito; Moreira, 2003) are more difficult to be automated due to the low level of semantics available.

Complex tasks require more information than only the source code. Since the infor-mation generally does not exist in concrete forms (e.g. requirements and architectural documentations), there is a need for techniques that extract software high-level infor-mation. The reengineering domain has conducted a major effort to extract high-level information about the software from the source code (Fuhr et al., 2013)(Detten; Platenius; Becker, 2013) (Fontana; Zanoni, 2011).

Automatic processes to extract high-level mental models is encouraged by the industry since it does not have resources to maintain their documentation and most companies place a high value in problems they are currently facing, than in the consequences of their acts in the future. Tools and techniques have been created to extract information that is automatically generated as software evolves. Such data can be found in repositories such as Source Code Managements (SCM) and Bug Track Systems (BTS). They have reliable information, since they are part of the development process and not from an additional phase that could be easily missed.

To help software engineers understand the high-level abstraction of the software two major areas can be mentioned: Software Metrics and Software Visualization. Software Metrics can be used to characterize the software systems. An example is to subdivide the classes of an object-oriented language into modules according to their coupling. This could help in understanding the location of the main modules that your software can not run without. Software Visualization is being explored as another promising area since it uses visual metaphors to represent different aspects of the software. Various endeavors were implemented by this field in the attempt to represent the software by real world metaphors in order to extract high-level information about the software (Kuhn et al., 2010)(Steinbr¨uckner; Lewerentz, 2010)(Wettel; Lanza, 2008).

2.3 REPRESENTING DATA

Visualization is used in many areas such as mechanical engineering, physics and medicine. Its wide range of applications is due to its facility in representing concrete objects and enabling extraction of valuable information just by looking at it. We need to deal with an enormous amount of information regarding problems we face every day. Currently it is practically impossible to handle such a huge amount of information without techniques and mechanisms to help us to get only helpful information.

Visualization is the most appropriate way that humans have to extract information from a set of data. As humans perceive visual attributes easier (Ware, 2004), we can

(24)

represent different data by mapping its real attributes to visual attributes (Mazza, 2009). For example, by replacing a set of textual numeric values with a set of bars with different widths, we can quickly extract the minimum and maximum number, as well as repeated numbers or oscillation patterns within the numbers. Notice that the initial function of the numbers still remains since we can realize which number is bigger or smaller. Every visual attribute mapped needs a legend to explain what it is mapping and the information it represents. The primary goal of visualization is to help extract more complex information about raw data. This is possible because the human brain is always processing a huge amount of data which can lead us to extract the maximum and minimum values, the existence of relationships, grouping, trends, gaps, or interesting values (Mazza, 2009).

Visual representation (i.e., visualization) can be used in different scenarios for many purposes. The main cases where visual representations can be used are: Presentation, Explorative Analysis, Confirmative Analysis, Scientific Visualization, Information Visu-alization, Software Visualization.

We can use pictures that will help the receptor to go through the explanation without getting lost. When visualization is needed to explain ideas that are too complex to put in words. That is the scenario of a presentation. In this case we are using visual elements as a communication channel to express concepts, ideas. This is really difficult because different receptors can make different interpretations about the message. That is why visual representation needs to be idealized with clarity, precision, and efficiency (Tufte, 1986).

Figure 2.1 The march of Napoleon’s army into Moscow at the Russian Campaign of 1912 (Encyclopaedia-britannica-online, 2013).

(25)

2.3 REPRESENTING DATA 11

of information in this picture. The size of Napoleon’s army is expressed by the width of the green and orange lines, respectively, marching to Moscow and on their way back. At the bottom of the image there is a statistical graphic representing the variation in temperature during their retreat. It is possible to see that the number of soldiers during the retreat has drastically reduced from 100,000 to 4,000 (approximately). One of the factors that contributed to their death was the drop in temperature from 18 degrees to -26 degrees reaching -30 degrees at some points.

When we have a lot of information about some subject it is important to try un-derstanding what else is there that we can not see just by looking at the data. That is the case of explorative analysis, visual representations used together with our ability of analysis to identify properties, relationships, patterns, and regularities.

Figure 2.2 is a visualization of friendships around the world according to Facebook data and can help us to do an explorative analysis. Each line connects two cities and its colour depends on the Euclidean distance and the number of friends between them. The initial purpose of this visual representation was to see how geographic and political borders affected where people lived relative to their friends. It is noticeable that the most connected places in the world are in Europe and North America and that each blue line might represent a friendship made while travelling, a family member abroad, or an old colleague or friend pulled away by the various forces of life (Facebook, 2013). It is possible to explore some aspects of this image such as why the Russian country is wiped off the map or why the concentration of connections in Brazil is in the south.

Figure 2.2 Facebook data about friendship connectivity around the world (Facebook, 2013).

Unlike explorative analysis, there are situations where hypotheses have already been raised and require confirmation. The confirmation analysis uses visual representations to quickly confirm these questions without the need to look at a lot of data and sometimes complex formulas.

(26)

There are different types of data to be displayed as visual elements. They can be divided into two groups: abstract and concrete data. The first is data without corre-spondence with physical space. The effect of the temperature on the Napoleon’s army during the retreat is an example of abstract data. There is no physical entity related to this information. The second is data with well structured shapes such as mathematical formulas and three-dimensional phenomena with real physical shape (e.g., the rain).

Figure 2.3 Visualization of a developing tornado displayed using thousands of circulating particles (Ncsa, 2013).

Scientific visualization allows scientists to visualize concrete data offering a realistic representation of some elements that are being studied. The visualization of any kind of flow has been an important and active research subject for many years (Johnson; Hansen, 2004). A lot of data is generated from simulators that calculate the flow dynamics, and analyzed using scientific visualization to provide an explanation for the flow. Figure 2.3 is an example of scientific visualization. In the film Stormchasers (Nova/wgbh, 1995), OMNIMAX theaters simulated a tornado for approximately ten minutes. It had relative wind speeds over 60m/s (134 mph) near the ground, and a 40 millibar pressure drop (Wilhelmson, 1996). Approximately 40GB of data was produced in this simulation and we can see it as an actually physical phenomenon through visualization in Figure 2.3.

(27)

2.4 SOFTWARE VISUALIZATION 13

Figure 2.4 FishEye menu showing 100 web sites at the same time. Adapted from (Bederson, 2000).

Information visualization can visualize abstract data that can be generated, calcu-lated, or found in many ways, such as data from common searches, data that affects the result of a soccer match and data displaying global climate changes. This kind of visualization, for abstract data, is very difficult to work with. Since abstract data does not have a physical shape or a human format convention, visualizations are built as a metaphor for some well know representations. Fish Eye (Furnas, 1986) is an example of information visualization and is used to explore detailed data without loosing the global context. How a fish would see an ultra-wide hemispherical view from beneath the water (PHILOSOPHICAL. . ., 1906) was the real world inspiration to build this visualization. It uses a real phenomenon to represent abstract data, that is how information visualization works. Figure 2.4 shows an example of the use of FishEye visualization in menus that have a lot of sub-items.

2.4 SOFTWARE VISUALIZATION

Software visualization is a subarea in information visualization which visualizes abstract data generated in the software development process. In this area, scientists are concerned with visualizing the structure, behaviour, and evolution of the software (Diehl, 2007). The

(28)

structure are all artifacts that were generated statically during the software development process. Source code, requirements, and test cases are examples of structures. Behaviour refers to software behaviour during its execution. An example could be the allocation of memory and resources or higher level information such as function calls. Finally, software evolution refers to static and dynamic information generated during the software evolution process. In this work, we use software visualization aiming to provide a new visual metaphor in order to investigate questions such as: How the quality of the software changes during the software development process? What bugs appears more often and in which parts of the system? The use of visualization to analyze the software evolution will be thoroughly described in the next section.

Figure 2.5 The principles of a polymetric view (Lanza, 2004).

In 1999, Lanza introduced the concept of polymetric views (Lanza, 1999) to visualize the software structure. With his tool, the CodeCrawler (Lanza, 2004), he was able to support the reverse engineering of software systems by visualizing the system structure with their relationships extracting this information from the source code. The polymetric views are simple interactive graphs, enriched by various software metrics (Figure 2.5). It is composed by different rectangles each one representing a different software entity (e.g., classes or packages). Its possible to associate different entity metrics to the position of the rectangle, its width, height and color. The rectangles are connected by lines that represent relationships between entities. Its possible to associate different metrics related to the relationship through the width and color of the line. If the polymetric entities are classes and their relationships are the inheritance between classes, its possible to see one perspective of the software structure. Lanza validated the effectiveness of the CodeCrawler by attempting to reverse engineer an industrial system in few days. The successful result is another proof of the capabilities software visualization has to support software comprehension.

Still in the 90s, Staples proposed a tridimensional exploration to visualize the software structure (Staples; Bieman, 1999). The approach, named Change Impact Viewer (CIV), is a tridimensional matrix where the base (x and z axis) represents the system classes and

(29)

2.4 SOFTWARE VISUALIZATION 15

Figure 2.6 Inter-class view displaying a call-graph with two classes expanded (Staples; Bieman, 1999).

the height (y axis) are the functions implemented by the class (Figure 2.6). The arrow between the classes indicates function calls.

Most of the previous works did not take full advantage of the third dimension, but later on another works started to emerge with new approaches (Teyseyre; Campo, 2009). The CodeCity can be highlighted as one of the new 3D visualizations (Wettel; Lanza, 2008).

It represents the system as a 3D interactive urban environment. The city provides an overview of the system’s structural organization by drawing the classes as buildings and the packages as districts (Figure 2.7). The width of the buildings represents the number of attributes the classes have, and the height represents the number of methods. With the visualization it is possible to identify some patterns such as massive buildings (potential god classes (Riel, 1996)) and some antenna-shaped constructions (potential bean classes). The visualization can offer consistent location and solid orientation points for the user.

(30)

Figure 2.7 An overview of the city of ArgoUML v.0.24 (Wettel; Lanza, 2008).

execution of the system escalates really fast, the amount of data that needs to be manip-ulated is huge. However, due to the increasing computational power available today, its possible to see many works addressing this area.

In 2007, a tool to visualize execution traces in order to support program comprehen-sion during software maintenance tasks was proposed (Cornelissen et al., 2007). The ap-proach, named Extravis, presents two synchronized views: a circular view and a massive sequence view (Figure 2.8). The first one shows the system’s structural decomposition and the nature of its interactions during the trace. The sectors represent the system en-tities (e.g. modules and classes). It is possible to have a hierarchic for them down to the system’s functions. Everytime a function calls another function a line is drawn between the sectors. Colors can be used to represent the direction of the call or if the call was least recent or most recent. The second view provides a concise and navigable overview of the consecutive calls between the system’s elements (e.g. classes and methods) in a chronological order. It works like a trace history that can be used for a more detailed analysis.

The synergies and dualities of the structural and behaviour approaches have been recognized (Ernst, 2003). Therefore, works combining these two approaches to extract information about the software are relevant. In 2006, a 3D visualization metaphor to support the animation of the behavior of features was proposed. The author integrates

(31)

2.4 SOFTWARE VISUALIZATION 17

Figure 2.8 Full view of the entire Cromod system trace (Cornelissen et al., 2007).

zooming, panning, rotating and on-demand techniques to support the usability of the visualization (Greevy; Lanza; Wysseier, 2006). They visually represent the dynamic behavior of features in the context of a static structural view of the system (Figure 2.9). They first apply static analysis to the source code of a system to extract a static model of the source code entities. Then, they create a 3D visualization with the same principals proposed in the polymetric views (Lanza, 1999) with the static model. The classes are represented by the gray boxes and the inheritance by the black lines between them. The width, length and color of the boxes can be associated to metrics. The next step is to apply dynamic analysis to obtain the stack traces of the executions. Then, they use these analysis to create more boxes on top of the classes. The number of boxes created on top of the class is related to the number of instances the class has created in the moment of analysis. At the end, the instance boxes are connected by red edges. Each one meaning that a message (or function call) was made between instances.

In 2008, an approach was proposed to use multiple views of the software that can be configured and combined according to the particular needs of the user to support specific

(32)

Figure 2.9 A schematic view of the 3D visualization (Greevy; Lanza; Wysseier, 2006).

software comprehension activities (Carneiro; Magnavita; Mendon¸ca, 2008). Later, in 2010 this approach was used to enrich four categories of code views with concern properties: a) concerns package-class-method structure; b) concerns inheritance-wise structure; c) concern dependency, and d) concern dependency weight (Carneiro et al., 2010). Figure 2.10 illustrates all four views. The first one (a) is related to the structural representation of the packages, classes and methods. It was developed based on the Treemap view (Shneiderman, 1992). The area of the rectangles can be configured to express the lines of code or complexity of the methods they represent. The rectangle color was used to represent methods that are affected by a specific concern. With this view it is possible to identify the parts of the system structure that are related to a specific concern. The second one (b) is based on the polymetric view (Lanza, 1999), therefore, it is used to visualize the software structure on the first one. However, this view has the capacity to represent module hierarchy realized by the use of both class and interface inheritances. The width of the rectangle is associated to the number of methods in the class and the height the lines of code. The color is used to mark which classes and interfaces satisfy a concern. The third one (c) represents the dependencies among software packages and classes using a graph-view. This view uses graph-nodes (small circles) to represent packages and classes, and arrows to represent their dependency relationship. The color also is used to mark which packages and classes are affected by a specific concern. The last view (d) shows the weight of each dependency between the packages and classes. This view uses two metaphors: a graph and a chess board. The graph view is similar to (c), however, it highlights a package or class in the middle and shows all dependencies between it and the rest of the nodes. The chess board view plots classes from the entire system as rectangles in multiple rows arranged in decreasing order of the total dependency weight. Both use the color of the nodes and rectangles to identify which concern they satisfy. For all the four views it is possible to filter the information on it. The entities that do not satisfy the search query are colored in white. They used these four views to identify code

(33)

2.5 SOFTWARE EVOLUTION VISUALIZATION 19

Figure 2.10 All four views used to identify code smells. Adapted from (Carneiro et al., 2010).

smells in software systems. During their studies they discovered seven observations that can be used to derive hypotheses about the support of the multiple views approach to characterize programs. Later on, those hypotheses were proved by a thesis defending the usefulness of multiple views during software comprehension activities (Carneiro, 2011). 2.5 SOFTWARE EVOLUTION VISUALIZATION

The use of visualizations to represent software evolution is well justified because of the amount of data generated during the software development process. All these data gen-erated from many sources such as version control systems, emails and technical meetings is too much to be analyzed using only textual techniques. Visualizations can reduce the complexity of analysis and help to understand some aspects of software evolution such as:

(34)

structural decay (Beyer; Hassan, 2006), architectural changes (Godfrey; Tu, 2001), software dependency evolution (Kula et al., 2014).

The use of visualizations to help analyzing software evolution is not something new. There are works dated 20 years ago, as reported in the systematic mapping study (Novais et al., 2013). However, there have been an increasing number of works during the last years.

Figure 2.11 SeeSoft showing the code age (Ball; Eick, 1996).

One of the first works in this area was successfully applied in several contexts, each one with a different perspective on software, e.g., static properties, performance profiles, and version histories (i.e., evolution) (Ball; Eick, 1996). That work created the SeeSoft tool which became one of the most well known software visualization tool. Several features of the SeeSoft metaphor assured its success and usefulness. One of the most important features is the natural and direct mapping from the visual metaphor to the source code and the other way around. This leads to natural navigation between representations. It uses pixel-oriented colorful paradigm to represent relationships between software elements rather than graph-based representations. Figure 2.11 has an example of the use of text, line, and pixel representations to analyze the code age. The newest lines are shown in red, the oldest in blue, with a gradient color in between. The browser (lower-right in the

(35)

2.5 SOFTWARE EVOLUTION VISUALIZATION 21

Figure 2.11) incorporates all three representations at once.

Figure 2.12 The Evolution Matrix and some characteristics about it (Lanza, 2001).

In 2001, Lanza proposed the use of software visualization techniques to recover the software evolution (Lanza, 2001). This approach was named Evolution Matrix because it uses a matrix layout to represent versions (columns) and classes (rows) (Figure 2.12). In each position there is a bi-dimensional box which represents the class (row) in a given version (column). The width and height of the box are associated to class-related metrics such as: lines of code (LOC), number of methods (NOM). They used this metaphor to understand how the class metrics changes during the versions. They found out some patterns in software evolution and used an astronomy analogy for most of them. An example is the pulsar pattern which classifies all classes that grow and shrink repeatedly during the software development. Another example could be the supernova pattern which is a class that suddenly explodes in size. In 10 years, this work became the most referenced paper inside the community of software evolution visualization (Novais et al., 2013).

Several works are starting to use the third dimension in their visualizations to express more information about the software evolution. One of the reasons is the evolution of user interaction techniques besides the common mouse-keyboard pair. SkyscapAR uses a tridimensional visualization with augmented reality to visualize the software evolution (Souza rodrigo; Manoel, 2012). It reuses the CodeCity algorithm (Wettel; Lanza, 2008) that represents a version of the software, and amplifies its functionality to visualize the software

(36)

Figure 2.13 The JUnit framework visualized as a city using SkyscrapAR (Souza rodrigo; Manoel, 2012).

evolution (Figure 2.13). Similarly to CodeCity, SkyscrapAR represents software packages as rectangular city lots and on top of them the sub-packages are added. Classes are represented by buildings (boxes with different areas and heights) located on top of their respective packages. The area covered by them is proportional to the number of lines of code of the class. The tool is able to visualize only one version at a time. Therefore, they set the terrain big enough to support the largest size the class had, and paint it green. Thus, the user can compare the size of the class in the current version of analysis against its largest size. The tool paints red the buildings that were changed, also comparing the last version to the current version of analysis. They use a piece of paper or any other object with a predefined black and white square pattern printed on it as a marker. This associated to a camera are able to put the visualization on top of this marker with augmented reality. The application of this tool could be its use in code review meetings and module inspections with a team effort to identify design flaws.

A more recent work was presented by (Rufiange; Melancon, 2014). In this work, the au-thors proposed a software evolution matrix-based visualization named AniMatrix. They used a taxonomy of dynamic graph visualization to organize the evolving graph network having multiple types of nodes and edges, focusing on both node-link and matrix views (Figure 2.14). They associated the nodes with different types of classes (e.g., normal class, abstract class, and interface) and the edges with different relationships (e.g., declarations, extensions, and constructor calls). If there are many relationships between two classes they subdivide the position into many squares according to the number of relationships (Figure 2.15). The visualization only shows one version of the system at a time but use colors inside the squares to represent state transitions.

If there is a case of a new relationship, the square will be filled with a gradient color from green to gray. Otherwise, if a relationship ended, the square will be filled with a gradient color from gray to red. Otherwise, if a relationship was modified, the square will be filled with the color blue. Otherwise, if the relationship already exists in previous

(37)

2.5 SOFTWARE EVOLUTION VISUALIZATION 23

Figure 2.14 A taxonomy of visualization techniques for dynamic graphs. (Rufiange; Melancon, 2014).

versions and nothing happened, the square will be filled with the color gray. The intensity of the gray color is used according to the weight of the relationship. They track the weight change of the relationship by filling the square with a gradient color from the previous

(38)

tone to the current one.

Figure 2.15 Staged animations used in the matrix-based visualization can show changes con-cerning multiple types of nodes and edges. (Rufiange; Melancon, 2014).

Along the years, many works in software evolution visualization were proposed. Ac-cording to the systematic mapping study (Novais et al., 2013), 146 studies were identified until 2011, and other six studies were identified during this work. Among these works two stand out for being similar to this work. They use a similar layout but their proposal focus on different tasks for a unique domain.

Figure 2.16 Principles of the Evolution Radar (D’ambros; Lanza; Lungu, 2009).

The oldest of these two works was published in 2009. They proposed a visualization-based approach that integrates logical coupling information at different levels of abstrac-tion (D’ambros; Lanza; Lungu, 2009). The work, named Evolution Radar, shows the depen-dencies among a module in focus and all the other modules of a system. The module in

(39)

2.5 SOFTWARE EVOLUTION VISUALIZATION 25

focus is represented as a circle and placed in the center of a circular surface (Figure 2.16). All of the other modules are visualized as sectors, whose size is proportional to the number of files contained in the corresponding module. The sectors are sorted according to their metric size, and placed in clockwise order. Within each module sector, files belonging to that module are represented as colored circles and positioned using polar coordinates, where the angle and the distance to the center are computed according to their name and their logical coupling, respectively. It is possible to see only one version of the software at a time but users can move through time using a slider. With this visualization, it is possible to track dependency changes detecting files with a strong logical coupling with respect to the last period of time, and then, analyze the coupling in the past allowing us to distinguish between persistent and recent logical couplings.

The other work was published in 2014. They proposed to visualize how the depen-dency relationship in a system and its dependencies evolves from two perspectives (Kula et al., 2014). The first one uses the same radial layout but with different concepts, and includes the use of heat-map to provide visual clues about the change in the library depen-dencies along with the system’s release history. They called a system-centric dependency plots (SDP). It is represented in Figure 2.17. The second one uses statistic graphics to create a time-series visualization that shows the diffusion of users across the different versions of a library, they called a library-centric dependants diffusion plot (LDP). It is represented in Figure 2.18. In the SDP visualization, each axis represents a library that the system depends upon. Starting from the center, each circumference represents a sys-tem version released. The time between releases is represented by the distance between circumferences. On each circumference, each of the dependencies used by that version is represented. The shape and color of the dependency represents the type of depen-dency relationship and the version. In the LDP visualization, the x-axis represents the time-series and the y-axis represents the accumulative sum of system versions per library version. Each point in the graph is plotted using a specific shape to indicate the depen-dency relationship between that particular system and the library (e.g. adopter, idle, or updater). The color is used to classify the different library versions that are represented by the group of lines that connect the plots.

A recent study characterized the software evolution visualizations in temporal strate-gies (Novais, 2013). The main difference among them is the approach used by the visu-alization to visualize the evolution. They could show only one picture representing the changes in the software modules, metrics, and properties as it evolves. Otherwise, they could show only one version (or period of time) of the software, also representing the changes in it, but with more detail. Those strategies are specialized in three: Overview, Snapshot, and Overview Accumulative. The Overview strategy shows information re-garding many version at the same time. The Snapshot strategy shows information about a specific version of the system. The Overview Accumulative strategy takes into account the absolute value of the changes in the software properties between versions to analyze the software evolution. The Temporal Overview and Temporal Snapshot are the strate-gies more frequently used to visualize the software evolution (Novais et al., 2013). However, less then half of the works use a combination of these two strategies.

(40)

Figure 2.17 SDP for FINDBUGS system. (Kula et al., 2014).

(41)

2.6 CHAPTER CONCLUSION 27

Figure 2.18 Simplified example of a LDP with a single system for the COMMONS-LANG library (Kula et al., 2014).

could decrease the user’s learning curve. The user will need to study and have some practice time with each one of the visualizations to be able to better identify aspects of the software evolution. The lack of integration between the visualizations is another problem. The user could easily get lost in years of the data due to the lack of synchronism between approaches.

2.6 CHAPTER CONCLUSION

This chapter presented a literature review about the concepts related to the area covered by this work. It clarifies that the software evolution is an eminent fact which will lead to a more complex system. As a result, the development of techniques and methods to better understand a complex software is not trivial, but very important. It demonstrates that the use of visualization techniques to represent the software has been successfully applied. In the case of the software evolution analysis, visualization has been indicated as a good approach due to the amount of data generated during the software development.

(42)
(43)

Chapter

3

This chapter discusses the origin of the EVOWAVE metaphor and how it works conceptually and tech-nically to represent evolutionary data for different domains. Initially, we introduce some facts about concentric waves that helped the idealization of the metaphor concepts. Afterwards, we present and de-scribe each one of those concepts highlighting how they can be mapped to software properties. Then, we discuss the algorithms to create a visualization of the metaphor and the metadata that makes it generic enough to represent data from different domains.

THE EVOWAVE METAPHOR

Software evolution generates a huge amount of valuable data from different sources. How-ever, tons of data without analysis tells us little or nothing about the software. This indicates the need for techniques that helps the data analysis through visualizations that efficiently organize the data. For example, it would be interesting to know which software module has been using most of the project resources. Therefore, we need to extract this data in an organized way, visualize it in such a way that users can correctly analyze it and get useful information.

EVOWAVE is a new visualization metaphor that enriches the analysis capabilities of software evolution. It is inspired on concentric waves with the same origin point in a container seen from the top as shown in Figure 3.1. This section presents the facts and concepts related to concentric waves which make EVOWAVE a promising software evolution visualization metaphor. Later, we explain how those concepts can be mapped to software properties and how the tool was implemented.

3.1 CONCENTRIC WAVE FACTS

During this research, we identified some facts about the formation of concentric waves which can be used to represent evolutionary data. Figure 3.1 will help to understand the facts related to the proposed metaphor. Each of them is discussed below.

3.1.1 The concentric wave propagation occurs in all directions

An external force must be applied in a container filled with liquid to generate the propa-gation of waves. In normal situations, they are distributed equally in all directions from

(44)

Figure 3.1 A snapshot of real concentric waves.

the center. This happens when the force direction is 90 degrees to the flat surface of the container. This force pushes the same amount of molecules in all directions which creates redundant information. This is by no means what the metaphor wants to achieve. If delimiters were installed in the container following the propagation path from the center (as a radius line), it creates regions that have no influence on adjacent ones. Thus, an external force could be applied to each region to push different amounts of molecules between each pair of delimiters solving the generation of redundant information.

3.1.2 The biggest waves have more molecules

The magnitude of the applied force responsible for the wave formation, will define how many molecules will be pushed away. During the wave formation the strongest force applied will generate the biggest wave. Thus, it will be the wave with most molecules. 3.1.3 The wave closest to the center is the last formed

Concentric waves are formed from the application of a force at its center that spreads over time. When looking at a snapshot of the wave formation process, the least propagated wave is the one closest to the center. This means that it is the last to be formed. This

(45)

3.2 EVOWAVE CONCEPTS 31

Figure 3.2 The EVOWAVE concepts

leads to a conclusion about the existence of a timeline in the wave propagation path, where the center is when the snapshot was taken and the most propagated wave distance from the center is the beginner. Therefore, each molecule has information about when some force was applied to it according to its location in the propagation path.

3.2 EVOWAVE CONCEPTS

Based on the wave facts, we derived a set of concepts used in the EVOWAVE metaphor. Figure 3.2 represents the concepts, which are explained bellow.

3.2.1 Layout

We observed from the described facts that the wave propagation path has the behavior needed to represent a period of time in software history. EVOWAVE has a circular layout with two circular guidelines (inner and outer), as shown in Figure 3.2-A. They represent a software life cycle period (e.g. [01 January, 2000 10:01:20 AM] to [01 January, 2014 05:20:01 PM]). This period, named timeline (Figure 3.2-A), is comprised by a series of short periods with the same periodicity (e.g., ten days, two hours, one month). The periodicity may differ between visualizations according to the size of the display available. The newest date can be associated with the inner guideline and the oldest date with the outer guideline, or the other way round, to give some orientation to the path between them. The display region between the two circular guidelines contains the timeline used for an overview of the software history for analysis.

3.2.2 Windows

EVOWAVE has a mechanism, named window, which compares a subset of short periods, making it possible to carry out a detailed analysis regarding the overall context. A window (Figure 3.2-B) is a group of consecutive short periods. It is circular in shape and its length depends on the number of grouped periods. The timeline is comprised by these windows, and each one of them has the same number of consecutive short periods.

Referências

Documentos relacionados

Ao Dr Oliver Duenisch pelos contatos feitos e orientação de língua estrangeira Ao Dr Agenor Maccari pela ajuda na viabilização da área do experimento de campo Ao Dr Rudi Arno

Ousasse apontar algumas hipóteses para a solução desse problema público a partir do exposto dos autores usados como base para fundamentação teórica, da análise dos dados

Diante do exposto, o presente trabalho teve como objetivos avaliar a qualidade do leite cru resfriado, através da contagem de células somáticas e de seus componentes gordura,

De fato, apesar de não exercer a música como primeira profissão, o descendente de espanhóis Raul Villa-Lobos, pai do futuro compositor, dominava bem a técnica do violoncelo

Mostrou-se que, para a região côncava da curva de potência (antes da velocidade nominal), a variação do alfa pode introduzir grandes variações de torque no rotor, quando a

(1984) analisando povoamentos de Eucalyptus saligna, aos 8 e 9 anos de idade, respectivamente, verificaram que em média 85% da biomassa aérea encontrava- se no fuste (madeira

Adultos que se aproximam carregando filhotes ou que carregam filhotes com maior freqüência são privilegiados pelo grupo e conseguem consumir mais gomas do que escarificar, em relação

The quality of the methods used to achieve color constancy can be measured by comparing the color measurements from the color patches of the color rendition chart in the color