Mining energy – aware commits: exploring changes performed by open – source developers to impact the energy consumption of software systems

(1)

MINING ENERGY-AWARE COMMITS: EXPLORING

CHANGES PERFORMED BY OPEN-SOURCE DEVELOPERS

TO IMPACT THE ENERGY CONSUMPTION OF SOFTWARE

SYSTEMS

Por Irineu Martins de Lima Moura

M.Sc. Dissertation

Universidade Federal de Pernambuco [email protected] www.cin.ufpe.br/~posgraduacao

RECIFE 2015

(2)

Universidade Federal de Pernambuco

Centro de Informática

Pós-graduação em Ciência da Computação

MINING ENERGY-AWARE COMMITS: EXPLORING CHANGES

PERFORMED BY OPEN-SOURCE DEVELOPERS TO IMPACT THE

ENERGY CONSUMPTION OF SOFTWARE SYSTEMS

Por Irineu Martins de Lima Moura

A M.Sc. Dissertation presented to the Centro de Informática of Universidade Federal de Pernambuco in partial fulfillment of the requirements for the degree of Master of Science in Ciência da Computação.

Advisor: Fernando José Castor de Lima Filho

RECIFE 2015

(3)

Catalogação na fonte

Bibliotecária Jane Souto Maior, CRB4-571

M929m Moura, Irineu Martins de Lima

Mining energy – aware commits: exploring changes performed by open – source developers to impact the energy consumption of software systems / Irineu Martins de Lima Moura – Recife: O Autor, 2015.

83 f.: il., fig., tab.

Orientador: Fernando José Castor de Lima Filho.

Dissertação (Mestrado) – Universidade Federal de Pernambuco. CIn, Ciência da Computação, 2015.

Inclui referências e apêndice.

1. Engenharia de software. 2. Mineração de dados. 3. Consumo de energia. I. Lima Filho, Fernando José Castor de (orientador). II. Título.

005.1 CDD (23. ed.) UFPE- MEI 2016-012

(4)

Dissertação de Mestrado apresentada por Irineu Martins de Lima Moura à Pós Graduação em Ciência da Computação do Centro de Informática da Universidade Federal de Pernambuco, sob o título “Mining Energy-Aware Commits: Exploring changes performed by open-source developers to impact the energy consumption of software systems” orientada pelo Prof. Fernando José Castor de Lima Filho e aprovada pela Banca Examinadora formada pelos professores:

________________________________________________ Prof. André Luís de Medeiros Santos

Centro de Informática / UFPE

_________________________________________________ Prof. Fernando Marques Figueira Filho

Departamento de Informática e Matemática Aplicada/ UFRN

_________________________________________________ Prof. Fernando José Castor de Lima Filho

Centro de Informática / UFPE

Visto e permitida a impressão. Recife, 24 de agosto de 2015.

___________________________________________________

Profa. Edna Natividade da Silva Barros

Vice-Coordenador da Pós-Graduação em Ciência da Computação do Centro de Informática da Universidade Federal de Pernambuco.

(5)

Dedico essa dissertação a todos aqueles que sempre me motivaram durante essa pequena jornada.

(6)

Acknowledgements

Dois anos e meio depois e o que para mim parecia ser um vislumbre distante está de fato se materializando. Reconheço, entretanto, que sem o suporte de várias pessoas talvez não estivesse escrevendo esses agradecimentos. À minha família e especialmente à minha mãe, agradeço o apoio constante mesmo nos momentos em que eu estive ausente (e foram muitos!). À minha namorada Evelyn, agradeço a infinita paciência, a incondicional confiança no meu potencial e todos os momentos bons que passamos juntos nesses últimos anos. Sou grato também aos meus amigos pelas risadas e pela motivação constante, em especial agradeço ao Gustavo e ao Felipe, sem os quais todo o trabalho construído para minha dissertação não seria possível. Gostaria de agradecer imensamente ao meu orientador Fernando Castor, que errou em me escolher como seu aluno de iniciação científica durante a graduação, persistiu também aceitando me orientar no trabalho de conclusão de curso e não satisfeito resolveu que seria uma boa ideia me orientar no mestrado, é muita ingenuidade para um professor só! Agradeço também aos meus colegas de trabalho, tanto aqui quanto nos EUA, pela paciência e compreensão principalmente nas últimas etapas. Por fim, agradeço ao CIn e a UFPE por me proporcionarem todo o aprendizado dos últimos anos.

(7)

Resumo

O controle do consumo de energia tem ganhado cada vez mais atenção como outro tipo de interesse ao qual desenvolvedores de software devem estar atentos. Antes esse tipo de preocupação era principalmente o foco de designers de hardware e desenvolvedores de baixo-nível, como por exemplo, desenvolvedores de drivers de dispositivos. Entretanto, devido à ubiquidade de dispositivos dependentes de bateria, qualquer desenvolvedor deve estar preparado para enfrentar essa questão. Logo, entender como eles estão lidando com o consumo de energia é crucial para estarmos aptos a auxiliá-los e para prover uma direção adequada para pesquisas futuras.

Com o intuito de ajudar nesse sentido, essa tese explora um conjunto de mudanças de software, isto é, commits, para entender melhor sobre os tipos de soluções que são implementadas de fato por desenvolvedores de código aberto quando os mesmos devem lidar com o consumo de energia. Nós utilizamos o GITHUB como nossa principal fonte de dados, uma plataforma de

hospedagem de código fonte para o desenvolvimento colaborativo de projetos de software, e extraímos uma amostra dos commits disponíveis entre vários projetos diferentes. Dessa amostra, nós manualmente selecionamos um conjunto de commits "energy-aware", isto é, qualquer commit que se refere a uma modificação de código onde o desenvolvedor propositalmente modifica, ou intenciona modificar, o consumo de energia (ou a dissipação de potência) de um sistema ou torna mais fácil para que outros desenvolvedores ou usuários finais possam fazê-lo. Nós então aplicamos sobre esses commits um método de análise qualitativa para extrair padrões recorrentes de informação e para agrupar os commits que intencionam reduzir o consumo energético em categorias. Uma pequena pesquisa também foi realizada com os autores dos commits para avaliar a qualidade da nossa análise e para expandir nosso entendimento sobre as modificações.

Nós também consideramos diferentes aspectos dos commits durante a análise. Obser-vamos que a maioria das modificações (~47%) ainda se aplicam às mais baixas camadas de software, isto é, kernels e drivers, enquanto que mudanças a nível de aplicação compreendem ~34% do nosso conjunto de dados. Nós notamos que os desenvolvedores nem sempre estão seguros do impacto de suas modificações no consumo de energia antes de realizá-las, em nosso conjunto de dados identificamos várias instâncias de modificações (~12%) em que os desen-volvedores demonstram sinais de incerteza em relação à eficácia de suas mudanças. Também apontamos alguns dos possíveis atributos de qualidade de software que são favorecidos em detri-mento do consumo de energia. Entre essas, destacamos alguns commits onde os desenvolvedores realizaram uma modificação que impactaria negativamente no consumo de energia com o intuito de consertar algum problema existente no software.

Também achamos interessante ressaltar um grupo específico de modificações que chamamos de “interfaces energy-aware”. Elas adicionam controles no software em questão que

(8)

possibilitam outros desenvolvedores ou usuários finais a ajustar o consumo de energia de algum componente subjacente.

(9)

Abstract

Energy consumption has been gaining traction as yet another major concern that main-stream software developers must be aware of. It used to be mainly the focus of hardware designers and low level software developers, e.g., device driver developers. Nowadays, however, mostly due to the ubiquity of battery-powered devices, any developer in the software stack must be prepared to deal with this concern. Thus, to be able to properly assist them and to provide guidance in future research it is crucial to understand how they have been handling this matter. This thesis aims to aid in this regard by exploring a set of software changes, i.e., commits, to obtain insights into actual solutions implemented by open source developers when dealing with energy consumption. We use as our main data source GITHUB, a source code hosting

platform for collaborative development, and extract a sample of the available commits across several different projects. From this sample, we manually curate a set of energy-aware commits, that is, any commit that refers to a source code change where developers intentionally modify, or aim to modify, the energy consumption (or power dissipation) of a system or make it easier for other developers or end users to do so. We then apply a qualitative research method to extract recurring patterns of information and to group the commits that intend to save energy into categories. A small survey was also conducted to assess the quality of our analysis and to further expand our understanding of the changes.

During our analysis we also cover different aspects of the commits. We observe that the majority of the changes (~47%) still target lower levels of the software stack, i.e., kernels, drivers and OS-related services, while application level changes encompass ~34% of them. We notice that developers may not always be certain of the energy consumption impact of their changes before actually performing them, among our dataset we identify several instances (~12%) of commits where developers show signs of uncertainty towards their change’s effectiveness. We also highlight the possible software quality attributes that may be favored over energy efficiency. Notably, we spot a few instances of commits where developers performed a change that would negatively impact the energy consumption of the system in order to fix a bug.

It is also worth noting, we draw attention to a specific group of changes which we call "energy-aware interfaces". They add tuning knobs that can be used by developers or end users to control the energy consumption of an underlying component.

(10)

List of Figures

2.1 Methodology flow . . . 20

2.2 Commits default analysis view . . . 32

2.3 Commits discussion view . . . 32

2.4 Coding view . . . 33

2.5 Survey analysis view . . . 33

(11)

List of Tables

2.1 Commits classification . . . 27

2.2 Email survey numbers . . . 29

3.1 Energy-aware set changes statistics . . . 35

3.2 Approaches that developers use to save energy . . . 36

3.3 Number of energy-aware commits per stack level . . . 43

3.4 Hedging commits changes statistics . . . 44

3.5 Quality attributes favored over energy-consumption . . . 48

(12)

List of Acronyms

LOC Lines of Code . . . 30 DVCS Distributed Version Control System . . . 21 DVFS dynamic voltage and frequency scaling . . . 37

(13)

3.5 RQ4 What software quality attributes may be given precedence over energy consumption? . . . 46 3.5.1 Performance . . . 46 3.5.2 Responsiveness . . . 46 3.5.3 Correctness . . . 47 3.5.4 No actual savings . . . 47 3.5.5 Miscellaneous . . . 47 3.6 Developers Survey . . . 48 3.6.1 Assessment . . . 49 3.7 Further Analysis . . . 49 3.7.1 Energy-Aware Interfaces . . . 49

3.7.2 Context-aware energy saving . . . 50

3.7.3 Mobile Environment and Android . . . 50

3.7.4 High number of soft-duplicates . . . 50

(15)

14

3.7.6 Energy-Aware Source Code Review . . . 51

3.8 Threats to Validity . . . 51 3.8.1 Internal . . . 51 3.8.2 External . . . 52 4 Conclusion 54 4.1 Related Work . . . 55 4.2 Future Work . . . 56 4.3 Other contributions . . . 57 References 58 Appendix 63 A Queries 64 A.1 First phase sampling query . . . 64

A.2 Second phase sampling query . . . 65

B Scripts 68 B.1 Availability checking script . . . 68

B.2 Duplicate detection script . . . 69

C Surveys 72 C.1 First phase survey email template . . . 72

C.2 Second phase survey email template . . . 73

(16)

15 15 15

1

Introduction

1.1 Motivation

Thanks to the diversification of modern computing platforms, battery-driven devices such as smartphones, tablets and unwired devices in general are now commonplace in our lives. However, such devices are energy-constrained, as they rely on limited battery power supply. Energy consumption directly affects the perception of users about their quality. For example, in a survey conducted with more than 3,500 respondents from 4 different countries (Cat Phones,

2013), long-lasting battery life has been cited as the most desired feature in a new phone by 71% of the respondents. Likewise, recent research pointed out that battery usage is a key factor for evaluating and adopting mobile applications (WILKE et al.,2013). As a result, not only researchers (HAO et al.,2013;ZHANG; HINDLE; GERMAN,2014;ZHUANG; KIM; SINGH,

2010), but also giants of the software industry (GOOGLE,2015;THEGUARDIAN,2015) have recently started to promote energy-efficient systems.

Traditionally, energy optimization research has focused at the hardware-level (e.g., (DAVID et al.,2010;HOROWITZ; INDERMAUR; GONZALEZ,1994)) and at the system-level (e.g., (FARKAS et al.,2000;RIBIC; LIU,2014)). Arguably, the strategy of leaving the energy optimization issue to lower-level systems and architecture layers has been successful. The main advantage of this approach is that user applications can be seen as black-boxes, i.e., no prior knowledge of the application source code or its behavior needs to be used to achieve better energy savings.

However, applications also impact energy consumption, although software itself does not consume energy. When one energy-inefficient piece of code is introduced in a system comprising hardware and software components, compilers and runtime systems cannot help much, since they are not aware of the semantics of the program. Unlike performance, where more efficient is always better, sometimes an application will purposefully consume more energy to provide its intended functionality, e.g., by activating an energy-intensive device.

In contrast, energy consumption bottlenecks often stem from inappropriate design choices, as is often the case with performance bugs (JIN et al., 2012). Finding and fixing these problems requires human insight. It is not always clear for programmers, however, what

(17)

1.2. OBJECTIVE 16

they can do from a software engineering perspective to save energy. There seem to be miscon-ceptions and lack of appropriate tools (PINTO; CASTOR; LIU,2014a).

Recent work (LI; HALFOND,2014; KWON; TILEVICH,2013; PINTO; CASTOR,

2014;PINTO; CASTOR; LIU,2014b;WAN et al.,2015) has shown that there is ample oppor-tunity to improve energy consumption at the application level. As an example, upgrading the ConcurrentHashMapclass available in Java 7 to its improved successor, available in Java 8, can yield energy savings of 2.19x (PINTO; CASTOR,2014). Nevertheless, developing an energy-efficient system is a non-trivial task. Even though researchers have made great strides in empirical studies aiming to understand the impact of different program characteristics in energy consumption (e.g., (LI; HALFOND,2014;PINTO; CASTOR; LIU,2014b;SAHIN; POLLOCK; CLAUSE,2014;ZHANG et al.,2012)), these studies do not cover a complete range of language constructs and implementation choices. Therefore, developers still do not fully understand how their source code modifications will impact energy consumption (PINTO; CASTOR; LIU,

2014a).

In spite of this grim scenario, many energy-efficient systems are being developed in practice. Starting from this premise, in this thesis we focus on a timely but overlooked question:

How are software developers changing their code in ways that possibly affect the

energy consumption of systems?

1.2 Objective

In this work we aim to identify different aspects of the actual changes performed by open-source developers when impacting the energy consumption footprint caused by their software. With this intent, we collect a sample of commits from GITHUBand manually analyze and classify them in order to be able to answer questions like:

What are the different types of commits that are likely to modify a software system

energy consumption?

What types of solutions do developers employ with the intent of saving energy?

How are the changes distributed across the software stack?

To what extent are software developers certain of the energy consumption impact of

their changes?

What software quality attributes may be given precedence over energy consumption?

Besides these questions, we also seek to discover new items that could lead to future researches in this field. Additionally, we are interested in knowing whether the changes had the desired effect and how the developers came to the conclusion of it. In this regard, we also survey the commit authors to receive their direct feedback regarding the changes.

(18)

1.3. CONTRIBUTIONS 17

1.3 Contributions

We believe that the main contributions of this work are:

The definition of an energy-aware commit and the discovery of a specific type of

functionality that we call energy-aware interfaces.

A list of seven most common themes that relate to how developers change software

to improve energy consumption. This list cover themes like “Energy bug fix”, “Fre-quency and voltage scaling”, “Disabling components”, “Using efficient component”, “Managing periodic work”, “Low power idling” and “Timing out”. We also briefly

present some less common approaches.

In particular, we show that energy bugs are common and that developers are quite

concerned with fixing them, 72 (16.78%) of the commits that show an energy-saving intention attempt to fix such issues. This number motivates further research on the identification and correction of these bugs.

Solutions based on traditional approaches such as frequency scaling and exploring

multiple levels of idleness are also common, covering 16.32% of the analyzed commit and 8.47% of our dataset.

We show that the majority of the energy-aware changes (47.09%) are still targeting

lower levels of the stack, although application level changes seem to have been increasing over the years

We found that ill-chosen energy saving techniques can impact on the correctness of

the application. 8 energy-aware commits warn about this.

Developers are not always certain about the impact of source code modifications in

energy consumption. 96 commit messages included hedging cues suggesting that the GITHUBcontributors were not entirely sure about their effects. We also identified 48 reverted commits and contradictions between the developers in GITHUBcommit discussions. Additionally, 23.53% of the survey answers regarding energy-saving commits stated that they never actually measured the impact while 22.06% were hesitating when they said that the change had the desired effect.

Even for systems where energy efficiency is critical, developers sometimes knowingly

apply modifications that will have a negative impact on energy efficiency in order to balance trade-offs. We identified 89 commits documenting these situations. In contrast with the hedging commits, this result suggests that some developers have a strong grasp over energy consumption-related solutions.

(19)

1.4. ORGANIZATION OF THE DISSERTATION 18

Lastly, all data used and generated by this study is made available on-line to motivate

and enable further research on energy-aware commits.

1.4 Organization of the Dissertation

The remainder chapters in this thesis are organized in the following manner:

Chapter 2 describes in detail the methodology used to conduct the research and gives

some small background necessary to understand it;

Chapter 3 presents the analysis results and discuss some of our findings;

Chapter 4 summarizes our contributions, discusses related works and presents

possi-ble venues for future research.

Lastly, the appendices for this thesis are structured as follows:

Appendix A contains the search queries used to sample the commits;

Appendix B contains the main scripts used to manipulate our dataset;

Appendix C contains the survey templates used for the survey conducted with the

commit authors.

1.4.1 Note on commits citation

Since we are analyzing commits, on many occasions we will need to cite them. To differentiate a normal citation from a commit citation, when referring to commits we use the following notation: [{PARTIAL_COMMIT_ID}], e.g., [C0].

(20)

19 19 19

2

Methodology

This chapter describes the methodology used in this work to answer our research ques-tions. They are defined as:

RQ1. What types of solutions do developers employ with the intent of saving energy?

RQ2. How are the energy-aware changes distributed across the software stack?

RQ3. To what extent are software developers certain of the energy consumption impact of their changes?

RQ4. What software quality attributes may be given precedence over energy consumption?

Since this is an exploratory research, naturally in Chapter 3 we also discuss other interesting findings that we observed along the conducted analyses.

2.1 Workflow

The work described in this thesis mainly followed a workflow composed of five steps: (1) sampling the commits, (2) filtering duplicates and unavailable ones, (3) classifying and making observations about them, (4) surveying the authors and (5) executing a qualitative research method over commits that intended to save energy. These steps are described in detail in Section 2.3. However, steps 1 through 4 were performed twice in distinct phases and each phase targeted two different samples. The first phase was an unguided, i.e., exploratory, phase where we consolidated our research questions and formulated the “energy-aware commit” definition. This phase gave us the experience to tackle the second phase more objectively and in a much faster manner. A high level visualization of the whole research workflow can be seen in Figure 2.1. Our final dataset, that contained the commits used in the analysis of step 5, was composed of the results of the two phases. When describing these steps in Section 2.3 we make the appropriate distinctions between the two phases where needed.

(21)

2.1. WORKFLOW 20

(22)

2.2. GIT AND GITHUB 21

2.2 Git and GitHub

Before describing our methodology, we must first give some background information on topics related to GITand GITHUBto facilitate the understanding of our methods.

GIT is a Distributed Version Control System (DVCS), i.e., a source code versioning

system that enables code sharing without the requirement of a centralized repository. In GIT,

every developer has his own copy of a repository and common source code versioning commands like submitting a change or checking out a new branch is performed within the developer’s repository. When a developer submits a new change to his local repository, i.e., commits, GITrecords several different types of metadata associated with the commit like: (1) a commit message provided by the developer to help understanding the change being introduced; (2) name and e-mail address of the author of the changes and the date of authoring; (3) name and e-mail address of the committer of the changes1together with the commit date; (4) a pointer(s) to the parent(s) of a commit; (5) a non-sequential id string that uniquely identifies the commit, also known as commit hash or sha. GIT’s branching model is lightweight and promotes easy and fast

creation and manipulation of development branches. Once a work on a branch is done, it can be mergedinto another branch to bring the changes from the former into the latter. Another more advanced form of changes sharing between branches is called rebasing. In a rebase GITtries to reapply the changes from one branch into another in a way that the final result would like as if the branches had never diverged, unlike in normal merges where the GIThistory will clearly show the point of convergence between the branches. When rebases are performed, GITtries to keep the author information intact, but other metadata like committer info, commit date and hash id are changed. In fact, any time a GITcommand requires changing metadata of a commit, a

new commit is created and such metadata is altered. To share source code, developers must first cloneother developers repositories and afterwards they may start sharing by pulling changes from and pushing changes to each other’s repositories.

GITHUBis a collaborative development service that has built many of its features around GIT. It has several tools and APIs that help in distributed development of software in general. Expanding on the easy branching model of GIT, GITHUBallow developers to share code through a “fork & pull” model (KALLIAMVAKOU et al., 2014). In this model, developers use a GITHUB feature called fork to create a clone of the project they desire to contribute to (the parent project) and when they are ready to share their modifications they may do so through pull requests. Pull requests, as it names implies, invite developers in another repository to pull the changes from the requesting developer’s repository. To promote collaboration, GITHUBalso

enables quick visualization of commit data through its web interface. In this interface, it is 1_G_IT_{makes the distinction between author and committer (}_CHACON_,₂₀₀₉_{), the former is the person who}

originally contributed the changes to the source code while the latter is the one who is creating the commit, that is, introducing the changes in the repository/branch. For example, when commiter C is creating a commit based on the commit of author A, in the metadata of C’s commit, A would be the author and C would be the committer. However, for GITHUBrepositories, most of the time they are the same (KALLIAMVAKOU et al.,2014)

(23)

2.3. DATA COLLECTION 22

possible to view the general commit metadata stored by GITas well as the differences introduced

by the changes performed in the source code. This interface also allows developers to create threads of discussions within a commit to enable sorting out any possible issues or questions regarding the commit in question.

2.3 Data Collection

2.3.1 Sampling

In order to start our investigation, we decided to select our dataset based on the con-tents of the commit messages. Usually this is the place where programmers express the intention of a source code modification. However, neither GITHUB web interface nor its web API2provide means for globally querying commit messages, i.e., to query over all com-mits over all GITHUB repositories. Instead, we had to resort to a proxy and we chose to use GITHUBARCHIVE(GRIGORIK,2015). According to their website, it is “a project to record the public GITHUB timeline, archive it, and make it easily accessible for further analysis”.

The archive is updated every hour and its dataset is available since February, 2011. As their description implies, they record events that happen in GITHUB. Examples of such events can be:

the creation of a new fork, publishing a comment in a bug report or, of our most interest, a push event. Push events are created when contributors of a project push their changes to GITHUB, such events contain some metadata about the commit, e.g., the commit message and hash id, and pointers to access the whole commit information. GITHUBARCHIVEexposes this data through a web tool3that enables querying and exporting query results. By using this tool, we were able to query the commit messages of any open-source project available in GITHUB that has ever received a commit since 2011. We performed a query to select commits that are most likely to be related to energy consumption. In the first phase of this research, our query searched for commit messages that contained some specific terms used with the keywords energy and power. These terms were: *energy consum*, *energy efficien*, *energy sav*, *save energy*, *power consum*, *power efficien*, *power sav* and *save power*. The character “*” in each term works as a wildcard: the query will select commits with messages that match at least one of these terms, regardless of the beginning or the end of the message content. These terms were derived from a previous work (PINTO; CASTOR; LIU,2014a). For the second phase, we used these same terms, but instead exchanging the keyword for the word battery, i.e., *battery consum*, *battery efficien*, *battery sav*, *save battery*. This was motivated by previous experiences of

this thesis author with the mobile environment where the word “battery” seems to be used as a synonym for energy. We also tried simple keywords searches like ”energy”, ”power” or ”battery”,

2_{In theory we could exhaustively search for each commit on each repository hosted in G}_ITHUB_{using the API,}

however such option is not feasible due to the sheer amount of data that would have to be processed and also because GITHUBimposes a limit on the number of requests that may be performed within a single hour.

(24)

but a search like this resulted in over 200,000 commits being selected, which would prevent us from conducting a mostly manual study. Considering the combined samples, we selected a total of 3993 items, 2189 resulting the from energy/power keywords query and 1804 resulting from the battery keyword query. They span the time range between 12/03/2012 through 15/05/2014. Both queries can be found in Appendix A.

The sampled data was loaded to a Google Spreadsheet4. This spreadsheet was used throughout the entire research to keep track of the commits during the anlyses described in sections 2.3.3 and 2.4. Any data data resulting from these analyses was properly codified and kept in specific columns to enable quick filtering, grouping and even further analyses.

It is important to stress out that each element (row) in our sample actually refers to a push eventto GITHUB. Nonetheless, since each push event is always related to an actual commit

we shall indiscriminately call them commits, unless where the differentiation is made necessary (sections 2.3.2.1 and 2.3.2.2). We also note that, due to limitations of the services we used, not all public commits pushed to GITHUBwere sampled. This is better explained in Section 3.8.

2.3.2 Filtering

After sampling the GITHUBARCHIVEdatabase, we felt the need to filter out some of the data since we observed several instances of unavailable and duplicate commits.

2.3.2.1 Unavailable commits

As mentioned in Section 2.3.1, the GITHUBARCHIVE database stores events of the GITHUB public timeline and these events contain partial information about the commits and pointers to the commits in GITHUB. If a commit becomes unavailable in GITHUB, this informa-tion is not reflected in the GITHUBARCHIVEdatabase and therefore, when we sample, we may end up selecting some these unavailable commits.

To detect and quickly discard such commits, we built a script that scans our entire dataset and tries to fetch the commit page on GITHUB. If the script fails to fetch the page after a couple

attempts, it marks the commit as unavailable and we do not consider it any further. Some of the reasons that explain why commits become unavailable might be:

the commit project could have been removed by its owner5

the project owner could have left GITHUB6

the branch(es) to which the commit belonged was(were) removed

the commit project could have been disabled by the GITHUBstaff7

4_{https://docs.google.com/spreadsheets/u/0/}

5_{https://help.github.com/articles/deleting-a-repository/} 6_{https://help.github.com/articles/deleting-your-user-account/} 7_{https://help.github.com/articles/dmca-takedown-policy/}

(25)

We also directly asked the GITHUBstaff if there is another reason for “unreachable commits”, and

they answered that “We don’t periodically gc repositories, however we do run gc on repositories on request from an owner of the repository, which would be the reason for unreachable commits”8 Initially, we ran this script to filter out most of the unavailable commits so that we did not have to waste time trying to visualize them on GITHUB. Afterwards, during the course of the research, we noted that some of the commits continued to become unavailable and therefore we decided to periodically run the script to always have the latest availability status for each commit and to avoid referencing non-available commits in this document. Considering that this is also a threat to the validity of such studies, in Section 3.8.2 we explain how we have dealt with this issue. Up to the time of thesis writing a total of 1247 commits in our overall sample were not available anymore. The availability checking script can be found in Appendix B.

2.3.2.2 Duplicates

The data sampled from the GITHUBARCHIVE database contained several duplicate commits. These commits had the same hash id and/or the same commit message. We believe this happens because (1) the exact same commit (same hash id) might be pushed to GITHUBmore

than once, possibly from different repositories, or (2) due to GITbranch manipulation commands

(see Section 2.2) actual copies of the commits were created and pushed. In either instance, each push would create a separate event in the GITHUBARCHIVEdatabase and thus would be

selected by our query. To filter out such commits and avoid unnecessary work, we created another script that detects and marks such duplicates. This script considered any two given commits as duplicates if they either had the exact same hash id or the exact commit message. In case of the latter, we also tried to match other commit metadata like the total number of additions and deletions, the author name and the changed file names. We believe this is required because there is a chance that short commit messages could cause false matches, that is, two different commits could have the same short message (e.g.,”power saving tweaks”). If the commits match by hash id or by message together some other metadata, we unconditionally mark one of them as duplicate and disregard it for future analysis. For message-only matches, we decided to manually verify them. We verified 53 commits and found that 8 were misclassified. Overall our dataset has 865 commits marked as duplicate. The duplicate selection script can be found in Appendix B.

2.3.3 Observing and classifying

After filtering, a team of three researchers manually analyzed the remaining commits. This team was composed of this thesis author, one former PhD student (Gustavo Pinto) and one current PhD student (Felipe Ebert). The aim of this analysis was (1) to identify which commits were actually related to energy-consumption and eliminate false-positives, (2) to gather 8_{The G}_IT_{gc command attempts to delete unreachable commits, i.e., commits without a branch or any other}

(26)

information9required to answer our research questions ( RQ2 , RQ3 and RQ4 ) and (3) to make observations about interesting items that may surface from the data. Initially the team mostly used the GITHUBweb interface to read the commit message and to visualize the changes

in the source code. In the second phase a set of UIs (Section 2.5) was primarily used for this analysis.

When first performing the classification during the first phase, the original goal was iden-tifying which commits were actually “related to energy consumption”. After some classifications we then established the notion of an energy-aware commit described in Definition 1

Definition 1. Any commit that refers to a source code change where the developers intentionally modify, or aim to modify, the energy consumption (or power dissipation) of the target system or make it easier for other developers or end users to do so.

This definition was then applied to identify the energy-aware commits in our dataset. Once we identified an energy-aware commit, we would label it according to three possible categories:

ENERGY-SAVING: The commit message or source code comments show a direct

intention to positively impact the energy consumption of the system;

ENERGY-TRADEOFF: The commit message or source code comments

acknowl-edge a negative impact in the energy consumption of the system or indicate the removal of an energy saving feature/measure;

ENERGY-AWARE INTERFACE: The commit message or source code comments

indicate that an energy-aware interface is being added or modified (see Definition 2)

During this same phase we also identified what we call energy-aware interfaces, their definition can be seen in Definition 2

Definition 2. Any functionality that provides tuning knobs for clients ( e.g., developers or end-users) to control the energy consumption of an underlying component.

For example, they can be a new user-visible option to toggle energy saving modes in a smartphone. Another example is when developers add a boolean flag to allow the clients of a class or method to decide when the component should use a more energy efficient option to perform its task. We discuss them more in Section 3.7.

In total we identified 826 energy-aware commits, 639 SAVING, 89 ENERGY-TRADEOFF and 98 ENERGY-AWARE INTERFACE.

Commits that did not fit in the energy-aware definition were considered false-positives. We note that, even though we consider primarily the commit message when classifying the

9_{During the first phase, the only task performed in this step was the energy-aware identification and classification,}

so to cover all aspects required to answer our research questions the researchers had to revisit the first phase commits to collect the missing data, e.g., defining the stack level of a commit.

(27)

commit, source code changes must also be available as we consider them a source of confirmation. Therefore, a commit message may show a clear intention to save energy, but still be considered a false-positive because there are no visible changes, e.g., the commit just updates a release notes file or only updates binary files. A total of 826 commits were marked as false-positive.

We also observed what we call ”soft-duplicate” commits, that is, commits that do not share the same hash id nor the exact same message, but have rather similar message contents and source code modifications. We believe that the reason for this is similar to the one explained in Section 2.3.2.2 and is also related to the use of GIT’s branch manipulation commands (see Section 2.2. For example, when rebasing or cherry-picking commits, the developer might choose to alter the commit message at his will. Another example is a special case of rebase called squashing, when using this command GITwill glob two or more commits together and

automatically copy the messages from each one into a single commit message, the final result is generally a bigger commit that shares parts of the changes and messages from the two (or more) previous commits. We disregarded 200 occurrences of commits with these characteristics. In some special circumstances, the team was also unable to properly classify the commits because the commit message was poorly structured or contained ambiguous language in a way that the researchers could not come up with an agreement. A total of 29 such unsure commits were marked and not considered any further.

This manual commit review process was divided among the researchers. Each commit was classified separately by at least two researchers and then the classifications were cross-checked. Any mismatch was thoroughly discussed and solved during online meetings between all members of the team. We conducted several of such meetings after the classification round for each phase.

Table 2.1 summarizes the number of occurrences of each type of commit in our sample after filtering, classification and the survey analysis (see Section 2.3.4). ENERGY-SAVING, ENERGY-TRADEOFF and ENERGY-AWARE INTERFACE encompass our set of enery-aware commits and from this point forth we shall use the term “energy-enery-aware set” to identify them. FALSE-POSITIVE refer to commits not considered energy-aware. DUPLICATE and NOT-AVAILABLE refer to commits that were automatically filtered by the duplicate and avail-ability detection scripts, respectively. SOFT-DUPLICATE refers to commits that were manually reviewed and considered duplicates. UNSURE refers to commits that could not be classified by the authors.

As stated previously in this section, the researchers also collected other information about the commits during this analysis. For each commit we would determine the target software stack level (see Section 2.3.3.1) of the change and whether the commits showed any signs of uncertainty when describing the energy consumption impact of the change (see Section 2.3.3.1). We would also add several different labels to the commits in the spreadsheet for any type of information that captured our attention. For example, when we found a truly interesting change because of its novelty we would add an interesting label to the commit. In another cases, when

(28)

Table 2.1: Commits classification

Classification Commits % ENERGY-SAVING 639 16.00 ENERGY-TRADEOFF 89 2.23 ENERGY-AWARE INTERFACE 98 2.45 FALSE-POSITIVE 826 20.69 DUPLICATE 865 21.66 SOFT-DUPLICATE 200 5.01 UNSURE 29 0.73 NOT-AVAILABLE 1247 31.23 TOTAL 3993 100

we spotted a piece of software related to a mobile environment we would we another specific label. This helped us to easily refer to these commits later on.

2.3.3.1 Software stack level

When answering RQ2 we considered the same software layer definition provided by (STALLINGS,2011), which encompasses Operating System, Libraries/Utilities and Ap-plications. Operating System includes Kernels, Embedded Kernels, Drivers and Firmwares. Libraries/Utilities include scripts (general purpose scripts, building scripts and compile scripts) and embedded/non-embedded libraries. Application includes embedded applications, desktop application, and mobile applications. When assigning the corresponding software layer for each commit we based our judgements on the following characteristics: (1) the project description on GITHUB, (2) the files names, extensions and directory structure and (3) the changed source code

itself and (4) any related external documentation. The project description is the main deciding factor since it usually describes the purpose of the software, however it is not always available and in these circumstances we need to base ourselves in other characteristics. Files names, extensions and folder structure may also give information about the stack level, e.g., a driver related commit [C0] may change files inside a folder named “driver” or a commit to an Arduino application may change files ending with an “.ino” extension [C2]. In the same manner, the code itself provides contextual information about the stack level, e.g., changes to code conforming to the same level may follow a given pattern or have certain similarities [C3,C4,C5].

2.3.3.2 Hedging

To answer RQ3 we considered the concept of hedging. It can be defined as the lack of commitment to the truth value of a proposition or a desire not to express that commitment categorically (HYLAND,1998). In the Natural Language Processing field it is a synonym for language that shows signs of uncertainty or speculation (SZARVAS,2008;AGARWAL; YU,

2010). Since a great deal of this work was manual labor, we used the guidelines provided by (VINCZE et al.,2008) to systematically look for hedges and take into account whether the

(29)

commit message contained any hedges regarding the energy consumption impact of the change. These guidelines were used by linguistics to find instances of hedging within a corpus of English biomedical texts and to annotate such instances with markers that define the scope of the hedging. They contain a list of hedging cues, i.e., common words or expressions that imply hedging (e.g., “might”, “should”) as well as instructions for discovering more complex forms of hedges. While the guidelines were meant to detect any instance of hedging within their target corpus documents, in this work we only consider hedges concerning the energy consumption impact, any other hedging within the commit messages is disregarded.

2.3.4 Surveying

As a mean of confirming our assumptions about the analyzed commits and to gather some more information about the changes, we devised an email survey that was sent out to the commit authors. Since it was small enough, we decided to embed the questions within the email message contents and we used templates to enable the customization of each message for each author/commit pair. The author e-mail addresses were collected from the author field of the GITcommit (See Section 2.2). Just as was the case with other steps in the methodology flow (Figure 2.1), the survey was sent out twice, one for each sample analysis phase. However, for the second batch of e-mails, we decided to not include authors whose commits had already been included in the first batch. We took this conscious precaution in an attempt to avoid jeopardizing the authors view of this research and possible future researches since repetitive unsolicited e-mails can be viewed as spam. Another difference between the first and second batches is that we altered the templates to include the energy-aware commit definition and to be more specific in our questions, this was motivated by replies from the first batch that sometimes failed to address our questions. The templates for the first and second phase batches can be seen in Appendix C. The survey contained questions to confirm the authors’ intentions (e.g., “Would you be able to describe your intention when you performed this commit?”), to inquiry that the change had the desired effect in the system (e.g., “Did you notice any difference on energy consumption footprint on the target system?”) and to know how the authors came to that conclusion (e.g., “How did you observe that?”).

We used another Google Spreadsheet to keep track of the delivery status for each e-mail and to store codified information extracted from the survey answers. Overall 816 email attempts were performed with 47 failing to deliver because the email address was invalid or unreachable. The remaining 769 e-mails were delivered to 583 authors10 and we received a total of 95 replies. 87 of these replies fully or partially addressed our questions while 8 did not answer them. For these unanswered e-mails, the authors either simply refused to answer or they were unable to answer because, as they acknowledged, the actual author of the commit was someone else. This 10_{Within each e-mail batch multiple e-mails were sent to the same author if the author had performed more than}

(30)

2.4. QUALITATIVE ANALYSIS 29

Table 2.2: Email survey numbers

Delivery status # e-mails (%) Reply status # e-mails (%) Answer status # e-mails (%) Undeliverable 47 (5.76%) Delivered Unreplied 674 (82.60%) Unanswered 8 (0.98%) 769 (94.24%) Replied 95 (11.64%) Answered 87 (10.66%)

equates to a 10.66% answer rate. Table 2.2 summarizes all these numbers. In total we sent e-mails for commits found in 648 different GITHUBrepositories.

We refer to the survey information when presenting the results of our research questions, but we also discuss it separately in Section 3.6.

2.4 Qualitative Analysis

To answer RQ1 , once we have classified the commits, we select the ENERGY-SAVING ones and try to extract reliable information from the commit messages and the source code using a qualitative approach named Thematic Analysis (BRAUN; CLARKE,2006). Thematic analysis is a common qualitative analysis method that emphasizes examining and recording themes within data. The application of this approach has six stages: familiarization with data, generating initial codes, searching for themes among codes, reviewing themes, defining and naming themes, and producing the final report. We explain how we conducted each one in the remainder of this section. While being described separately here, the two first stages shown below were actually performed concomitantly with the commit classification step (Section 2.3.3).

1. Familiarization with data: Here the researchers analyzed the commit message and the source code modification for each commit.

2. Generating initial codes: Each author gave a code for each analyzed commit. The code is an attempt to express the core of the modification. For instance, a commit that adds a new DVFS (HOROWITZ; INDERMAUR; GONZALEZ,1994), a technique for dynamic scaling CPU frequency, algorithm can be coded as “DVFS”. In this step, we also refined codes by combining and splitting potential codes.

3. Searching for themes: In this step, this thesis author tried to combine coded data to form an initial list of themes. When in doubt, the other researchers provided support to find broader patterns within data.

4. Reviewing themes: At this stage, we have a potential set of themes. We then searched for data that supports or refutes our themes. For instance, we updated the theme of a commit that was initially themed as “DVFS” to “Frequency and voltage scaling”. In

(31)

2.4. QUALITATIVE ANALYSIS 30

this commit, the programmer decreased the voltage of the display. This solution is related to voltage scaling but not to DVFS.

5. Defining and naming themes: Here we refined existing themes. At this time, most of the themes already had a name. However, we have renamed some of them to cover codes with small numbers of commits, otherwise we would have to discard them. We established fifteen as a threshold for the minimum number of elements that a group of code-related commits must have to be considered a separate theme. In a previously published study (MOURA et al.,2015), we used five as the commit threshold. That study already covered a large number of commits (290) and only half of the themes were actually discussed, here the number of commits is even greater (429) so we decided to increase the threshold to reduce the number of themes generated and increase our confidence on the ones on which we report. Since we are attempting to assign a single theme for each commit, commits with more than one energy-saving change are considered in a separate Outlier theme.

6. Producing the final report: This process led to the elicitation of 7 main themes. These themes are discussed in Section 3.2. Some of the commits without a proper theme are also discussed.

2.4.1 Analysis exclusion criteria

To avoid introducing unnecessary bias due to our lack of understanding of some of the project domains, we take a conservative approach and do not analyze commits that do not pass certain criteria:

1. Generally we do not cover commits with too many changes (> ~1000 Lines of Code (LOC)). The amount of modifications in these commits makes it quite hard to visualize in the source code the ones that actually aim to save energy. However, in some instances, we did consider the commit if the message enabled us to quickly perform a string search to spot the source code modification.

2. We also do not consider a commit suitable for analysis if (1) the message contains no description at all besides showing the energy saving intention, e.g., ”Tune for battery life” [C6], or if (2) the message or source code comment only contains a literal description of the modification, e.g., ”reduce battery usage: RCU_ FAST_ NO_ HZ = ON” [C7], with no rationale or background information about the change. Such commit messages contain barely any content that could be used for the qualitative analysis and understanding them would require knowledge of the software or the software domain in question.

(32)

2.5. ANALYSIS GRAPHICAL USER INTERFACE 31

3. Lastly, we also do not consider commits when none of the researchers could actually map the commit description to the source code change. This is usually the case for changes that are too domain or software specific and require knowing particularities of the software in question.

We try to mitigate such commits through the survey presented in Section 2.3.4 and a total of 9 commits previously not analyzed could be considered after the survey. Overall 210 commits were excluded from this analysis. However, we do note that they were still taken into account for the other research questions. This left us with 429 commits that were then taken into consideration to answer RQ1 .

2.5 Analysis Graphical User Interface

The sheer amount of information in the Google Spreadsheet that was used to manage our dataset made the already tiring task of manually analyzing the commits even more laborious. While performing certain tasks in the spreadsheet UI was certainly easy and straightforward, e.g., obtaining counts or filtering and comparing commits in a group, when analyzing individual commits there was too much information noise of the neighboring rows (commits). Also, since the commit source code was not available in the spreadsheet and had to be viewed in the GITHUBwebsite, there was a constant context switch between the spreadsheet and the commit

page on GITHUB.

To alleviate these issues, we decided to create graphical user interfaces customized to our analysis tasks. We used these interfaces for individual commit analysis, for commit discussions among the researchers and also for the survey analysis. The interfaces were built using several web technologies and were hosted on Google App Script11 to allow direct access to the spreadsheet contents. To give a perspective on the helpfulness of these interfaces it is interesting to mention that the manual analysis of the initial dataset took over 3 months covering nearly one thousand commits while the second analysis took less than a month also covering nearly the same amount. We do not claim this as a hard truth about such UIs usefulness, but it is surely a point to be considered for future research. Samples of the used UIs can be seen in Figures 2.2, 2.3, 2.4 and 2.5

11_{http://www.google.com/script/start/}

12_{This view was mainly required because this thesis author had sent the survey through his email account and the}

(33)

Figure 2.2: Commits default analysis view - This was the default view used during the classification of the commits. On the top right corner there is a mask selection mechanism that allows different visualization of the properties of a commit so that a user can focus on

specific properties when needed

Figure 2.3: Commits discussion view - This was the view used after the initial classification of a dataset to discuss any disagreement or pending issue regarding the

(34)

Figure 2.4: Coding view - This view was used during the qualitative analysis to perform the coding steps

Figure 2.5: Survey analysis view - This view was used during the survey analysis to facilitate visualization of the email contents among all researchers12

(35)

34 34 34

3

Study Results

In this chapter we present the results of the analyses described in Chapter 2. We start by highlighting some numbers of our energy-aware set (Section 3.1). Then we present the results for our research questions (Sections 3.2, 3.3, 3.4 and 3.5) and in Section 3.6 we briefly discuss the survey results. Lastly we provide some further discussions on items in our dataset that we considered noteworthy (Section 3.7).

3.1 Energy-Aware Set Description

Our energy-aware set is composed of 826 commits. This number represents 20.69% of our total sample and 0.0009% of the push events in the GITHUBARCHIVEdatabase over the

sampled time range. They were found in 659 different GITHUBrepositories and were performed

by 591 different authors1. We found a total of two outstanding authors, together they performed a total of 34 commits (19 and 15 each) which is equivalent to 4.12% of our entire energy-aware set. Analyzing the commits of one these top authors, we observed that they greatly differ between each other in terms of the intention used to save energy. For instance, they vary from (1) changes to tweak governor parameters [C14], to (2) disable a feature when the display is turned on [C15] and to (3) directly reduce an LCD display voltage [C16]. This same author also performed commits to revert a previous energy-saving tweak due to issues that it introduced [C17] and to add an energy-aware interface [C18].

Of the 659 repositories, 208 are forks of some other repository and the other 451 are base repositories2. In average they have 5,83 branches (SD: 7,12; Median: 3). The most common top languages in these repositories are, in this order, C, Java and C++. The GITHUBrepository that contained the most commits is RAZR-K-Devs/android_kernel_motorola_omap4-common, with 14 energy-aware commits. This is a Kernel project based on Motorola 3.0.8 Android Kernel. The energy-awareness here also varied greatly. For instance, some commits were performed with the intention to (1) select different energy-efficient governors [C19], to (2) improve an existing

1_{We count as an author the e-mail address contained in the G}_IT_{commit author field}

2_{By fork here we mean repositories that were created using the G}_ITHUB_{forking feature, in reality many of the}

(36)

3.2. RQ1 WHAT TYPES OF SOLUTIONS DO DEVELOPERS EMPLOY WITH THE

INTENT OF SAVING ENERGY? 35

Table 3.1: Energy-aware set changes statistics

Statistic Changes Files touched

Mean 377,79 4,86 Median 35 2 Mode 2 1 SD 2.256,37 11,63 Max 52.124 171 Min 1 1

governor implementation [C20], and to (3) add an energy-saving option (energy-aware interface) based on the screen state [C21]. This shows that the same software application can benefit from different energy-aware optimizations. Other than this project, 3 other projects have between 6 to 8 energy-aware commits each. 13 projects have between 4 to 5 energy-aware commits each, and the remaining ones have less than 4 energy-aware commits each.

Regarding the commit sizes, Table 3.1 summarizes the changes3statistics. We can see that they range from very small (1 change only) to unusually large (52,124 changes) and the number of files changed goes from just 1 to 171 files (surprisingly, not the same commit as the one with the largest number of changes). We can see that at least 50% of them performed 35 changes or less in total and touched only two file or less. A caveat must be made when interpreting commit change numbers: a commit may include (and actually in many circumstances does include) several unrelated changes.

3.2 RQ1 What types of solutions do developers employ with

the intent of saving energy?

This research question aims to elucidate how practitioners are actually changing their software to improve energy consumption of their systems. After the exclusion criteria described in Section 2.4.1, we considered a total of 429 commits. From these commits we derived seven themes which we introduce in the following sections. For some themes we also present a list of sub-themes. Changes that did not fit in a group of fifteen or more commits were grouped under the Miscellaneous theme. In the Outliers theme we grouped changes that contained two or more energy saving approaches. We discuss the most outstanding ones of both of them briefly.

We note that in some theme names we use the term component, in these circumstances it can mean both a hardware component, e.g., a sensor, a display, a Wi-Fi interface, etc; or a software component, e.g., a service, an UI element, a library or a whole feature. When discussing these themes we try to show examples for both cases if applicable.

Table 3.2 summarizes the approaches that developers use. The theme with the largest 3_G_ITHUB_{counts line updates as one deletion plus one addition, therefore the total number of changes does not}

(37)

Table 3.2: Approaches that developers use to save energy

Theme # commits %

Energy bug fix 72 16.78

Frequency and voltage scaling 70 16.32

Disabling components 43 10.02

Using efficient component 36 8.39

Managing periodic work 31 7.23

Low power idling 16 3.73

Timing out 15 3.50

Miscellaneous 126 29.37

Outliers 20 4.66

number of commits were Energy bug fix followed by Frequency and voltage scaling. Together they add up to 142 commits which is equivalent to 33.10% of the analyzed commits.

3.2.1 Energy bug fix - 72 occurrences

This theme contains commits that fix “Energy Bugs”. An energy bug is: “an error in the system, either application, OS, hardware, firmware or external, that causes an unexpected amount of high energy consumption by the system as a whole” (PATHAK; HU; ZHANG,2011). We consider a commit as an energy bug fix if the programmer clearly states in the commit message that it will fix an energy bug. We found several different types of energy bug fixes and here we group them by sub-themes, some of these sub-themes have a corresponding name in the taxonomy described by (PATHAK; HU; ZHANG,2011) and we show this equivalent name in brackets next to the sub-theme name.

3.2.1.1 Preventing low power [No-sleep Bug]

Commits in this sub-theme try to remove some undesired condition that is preventing a device to go into a lower power mode. In commit [C49], for example, the developer inhibited a CPU always on two-core policy because it “prevented the device from entering deep sleep”. We found four instances of them.

3.2.1.2 Reducing wake ups

Similar to preventing low power, commits in this sub-theme try to reduce the number of wake ups caused by an issue that continuously requires a device to come out of its low power mode more often than needed. We found nine of them. For example, commit [C50] reduces a sampling interval because “Original implement(sic) will keep UE wake up every 12 minutes”.

(38)

3.2.1.3 Preventing resource leak

Resource leak is a well known type bug that can slowly deplete a system’s memory, but they can also be the source of energy bugs. Eight energy bug fixes mention a solution to stop some form of resource leak that was causing an abnormal increase in energy consumption. For instance, commit [C51] states as its cause “(long commit id) intruduces (sic) a bug where the ts uart could not be closed correctly”.

3.2.1.4 Reducing excessive work

In this sub-theme the commits attempt to reduce some unwanted excessive computation or usage of a component. We found a total of five such commits and of note we cite [C52] that hides an UI element because under certain conditions it would cause “continuous buffer updates” that resulted in a “spike in power consumtion (sic)”.

3.2.1.5 Stopping endless computation [Loop Bug]

We grouped four commits in this sub-theme and they all attempted to fix an issue that caused a computation to never stop. For example, commit [C53] “remove infinite animation that triggers continious(sic) style recalculation”.

3.2.1.6 Other

The remaining bug fixes (43) did not fit in a specific sub-theme and were grouped here. They either (1) did not specify the root cause of the issue, (2) were too domain specific or (3) did not have enough representatives to be grouped together within a sub-theme. Some good examples are: trying to fix an integer overflow that causes a higher than should be processor state to be selected [C55]; fixing an incorrect port configuration that was causing the system to use more power than needed [C56]; or reducing the voltage of a component that was wrongly increased in a previous commit [C57].

3.2.2 Frequency and voltage scaling - 70 occurrences

The second theme with the largest number of commits in our sample contained solutions related frequency and voltage scaling. The key insight is that a lower frequency yields lower power consumption. Saving energy, however, is not the same of saving power, because a reduction in frequency may increase the execution time. The challenge here is to figure out when the reduction in frequency is significant enough to cause performance degradation, thus negatively impacting on energy saving. In our analysis, we observed that such manipulations can be static or dynamic. In the static approach the programmer hard-codes a new frequency/voltage value directly in the source code. In the dynamic approach they are commonly using dynamic voltage and frequency scaling (DVFS) techniques (PERING; BURD; BRODERSEN,1998).

(39)

Although frequency and voltage scaling became a popular technique to make CPU processors more energy-efficient, we have identified several commit authors who focused on peripherals. For instance, an author said that “Reduce Wifi voltage for power savings. Should be beneficial for a wifi only device” [C87]. In such commit, the GITHUB contributor changed a single line of code, updating a variable from .microvolts = 2000000 to .microvolts = 1800000. This commit used a static approach, that is, the author hard-coded a new voltage value.

Solutions using the dynamic approach are greatly diverse. For instance, DVFS offers the chance to change the CPU frequency on the fly. DVFS algorithms, or “cpufreq governors”, dynamically decide what frequency should be used at a given time. We found several commits focused on using such DVFS features. They vary from (1) tuning existing governors [C88] or (2) setting a different governor as default [C89]. However, this approach hides important perils. As discussed in recent literature (LIU; PINTO; LIU,2015;KAMBADUR; KIM,2014), well-established Linux governors do not provide effective energy savings. In the worse case, governors can even increase energy consumption, instead of reducing it.

3.2.3 Disabling components - 43 occurrences

In this theme the basic premise for the commits is to disable or stop components in an attempt to reduce energy consumption. We found four different situations in which such disabling happens and here we group the commits in sub-themes according to these situations.

3.2.3.1 Context-aware disabling

Commits in this sub-theme follow a similar pattern of “disable X when Y”, they generally attempt to disable components when under certain system conditions that would render the component usage unnecessary. Such conditions vary greatly and they can be, for example, (1) when an application is not in foreground [C74]; (2) when there is no network connectivity available [C75]; (3) when the screen is off [C76]; when not showing an UI element [C77]. The components that are disabled when such conditions are met can be: (1) threads [C74]; (2) background services [C75,C76]; (3) GPS [C77]; (4) timers [C78].

3.2.3.2 Inefficient disabling

Here the commit authors attempted to disable components that they deemed energy-inefficient, even if that meant losing a feature. An example that we obtained from one of the survey responses was the disabling of a syntax highlighting plugin for a text editor that was “disk and CPU heavy” [C64].

(40)

3.2.3.3 Unnecessary disabling

Unlike in the context-aware disabling sub-theme, here the commits perform changes to unconditionally disable components that are either unused or thought to be unnecessary. Commit [C65] turns off several hardware components not needed by the Arduino application in question.

3.2.3.4 Unconditional disabling

The remaining commits were grouped under this sub-theme. They mostly do not present an explicit motivation or condition for disabling the component, but still show the intention to disable as a mean of achieving energy savings. A small example is commit [C66] which simply states “Disable accelerometer and compass on Android to save battery.”

3.2.4 Using efficient component - 36 occurrences

The commits in this theme perform changes to use more efficient versions of certain libraries or services, as well as more energy-efficient devices. We also include in this theme commits that make use of power saving mode, usually a black-box technique that does not require knowledge about how the energy saving is achieved. Examples in this category are: (1) using power efficient work queue [C80], (2) using a connectivity engine to provide enhanced network selection [C79], and (3) enabling a thermal framework to achieve energy savings [C81]. Most of these energy savings components are offered by newer kernels (e.g., the power efficient workqueue and the Thermal Framework), which greatly reduce the barrier for employing an energy saving technique in low-level applications, since the programmer does not need to worry about low-level implementation details, which are abstracted away in those libraries.

3.2.5 Managing periodic work - 31 occurrences

All the commits in this theme are in some form dealing with computations that happen periodically or that only need to happen periodically. They either (1) decrease the frequency with which a given computation happens, (2) split a continuously running computation by introducing intervals or (3) remove a periodic computation that only needs to be executed on-demand. Examples for first case are commits that reduce the frequency of background scans [C67,C68,

C69], the sampling rates of a sensor [C70] or the synchronization frequency of background services [C71]. For the second case we list [C72] that removes a continuously running service in favor of performing the computation in intervals triggered via OS alarms. For the last case we cite commit [C73] that removed a periodic feed update to only perform it when actually requested by the system.

Mining energy – aware commits: exploring changes performed by open – source developers to impact the energy consumption of software systems

MINING ENERGY-AWARE COMMITS: EXPLORING

CHANGES PERFORMED BY OPEN-SOURCE DEVELOPERS

TO IMPACT THE ENERGY CONSUMPTION OF SOFTWARE

SYSTEMS

Por Irineu Martins de Lima Moura

M.Sc. Dissertation

Universidade Federal de Pernambuco

Centro de Informática

Pós-graduação em Ciência da Computação

MINING ENERGY-AWARE COMMITS: EXPLORING CHANGES

PERFORMED BY OPEN-SOURCE DEVELOPERS TO IMPACT THE

ENERGY CONSUMPTION OF SOFTWARE SYSTEMS

Por Irineu Martins de Lima Moura

Acknowledgements

Resumo

Abstract

List of Figures

List of Tables

List of Acronyms

Contents

1

Introduction

1.1

Motivation

1.2

Objective

1.3

Contributions

1.4

Organization of the Dissertation

1.4.1

Note on commits citation

2

Methodology

2.1

Workflow

2.2

Git and GitHub

2.3

Data Collection

2.3.1

Sampling

2.3.2

Filtering

2.3.3

Observing and classifying

2.3.4

Surveying

2.4

Qualitative Analysis

2.4.1

Analysis exclusion criteria

2.5

Analysis Graphical User Interface

3

Study Results

3.1

Energy-Aware Set Description

3.2

RQ1 What types of solutions do developers employ with

the intent of saving energy?

3.2.1

Energy bug fix - 72 occurrences

3.2.2

Frequency and voltage scaling - 70 occurrences

3.2.3

Disabling components - 43 occurrences

3.2.4

Using efficient component - 36 occurrences

3.2.5

Managing periodic work - 31 occurrences