Data Analysis - Research Design - Increasing Release Frequency by Accelerating Software Develop

3.2 Research Design

3.2.3 Data Analysis

were conducted on-site and one of the interviews was conducted remotely.

All interviews were recorded and transcribed later.

Data collection for the longitudinal survey study reported in Publica- tion III was done using an online questionnaire that can be classified as an indirect second degree method of collecting data. The survey measured project-level maturity of software development practices in areas such as test automation, quality, build and deployment, running and monitoring, and the lead time to release new versions over a two-year time period in a Finnish company. Selection of projects in the company was based on self- selection as project representatives were free to choose whether to answer the survey or not. Information about the survey was sent via e-mail to all active projects in the company. Initially, the survey had been conducted in 2015 and it was replicated in 2017 with minor changes to the questionnaire.

There were responses from 43 projects in 2017, which is comparable to the 35 project responses in 2015. Since the company was involved in developing customer projects, the projects changed from time to time. Out of the 43 projects in 2017, 10 were the same as in 2015. Thus the bounding of the case is twofold. The context is the company itself with the organization being a unit of analysis as well as the individual projects that were active both in 2015 and in 2017.

grouped, forming a solid basis for the conclusions and completing the chain of evidence from the collected data.

Analysis of qualitative data begins with the coding phase (Runeson and Höst, 2009; Wohlin and Aurum, 2015). In the coding phase, researchers read through transcribed text and assign labels to noteworthy sections. Patterns or themes start to emerge when similar labels can be used to characterize sections multiple times within and across cases. Themes can be further organized to higher-level categories. Identifying codes and themes is the process of open coding and finding the relationships between themes and categories is axial coding (Wohlin and Aurum, 2015). Thematic analysis is an analysis method that can be used to structure analysis of qualitative data by further formalizing the analysis phases.

Thematic analysis wraps up general phases of qualitative data analysis to six different phases (Wohlin and Aurum, 2015). Getting to know the data is the first important phase. Researchers should immerse themselves in reading and understanding the collected data (Cruzes and Dybå, 2011).

An initial set of codes can be defined in the second phase once there is enough understanding about the data. Coding in thematic analysis is typi- cally open coding (Wohlin and Aurum, 2015) that requires multiple passes to get the codes right (Cruzes and Dybå, 2011). Coding approaches include inductive coding where codes are drawn purely from the data, deductive coding that favors predetermined codes to label data, and integrative coding that mixes the two approaches (Cruzes and Dybå, 2011). Phases three to five in thematic analysis are related to finding explicit semantic or la- tent, hidden, themes in the data and codes, and defining and reviewing the themes (Wohlin and Aurum, 2015). Codes can be plentiful so themes help to categorize the codes by narrowing down the amount of analytical units originating from the text (Cruzes and Dybå, 2011). Themes can also have relationships to higher-order themes if the themes relate to the same topic. A thematic map connecting themes to other themes and higher-order themes can be used to develop a taxonomy of sorts (Cruzes and Dybå, 2011).

The final sixth step in thematic analysis is creating the report based on the analysis (Wohlin and Aurum, 2015).

Thematic analysis as an analysis method has been extended by the method of thematic synthesis. Thematic synthesis is similar to thematic analysis but the method emphasizes the importance of theme development in the analysis of data (Cruzes and Dybå, 2011). Thematic synthesis in software engineering has been noted to be of use with systematic litera- ture reviews in classifying primary studies (Cruzes and Dybå, 2011) but

the suggested guidelines for coding text and developing themes should be applicable for other qualitative data such as interviews as well.

All studies in this thesis took advantage of thematic analysis in various forms. Thematic analysis in studies reported in Publication I and Publica- tion II was applied in a similar manner as the studies shared much of the same interview data. The interviews were recorded, partially transcribed on-site by researchers acting as scribes and later checked against recordings to keep the chain of evidence intact. Besides the partial transcripts, interview summaries made by the researchers were taken into account in the data analysis. An integrated approach to coding was used in these two studies. The higher-order themes to which the themes and codes were asso- ciated with originated from the layout of the interview questions e.g. for the rationale why projects should move to continuous deployment as seen by respondents. The process analysis reported in Publication II applied thematic analysis by coding process descriptions and technologies listed in the process diagrams drawn by the respondents. Software process phases were the higher-order themes used in coding to determine software development practices in each case and project.

Although the study reported in Publication III used quantitative data from a survey, the study also had a qualitative nature. The survey had an open feedback section used to gather feedback on the usage of the maturity model survey in the company. Open feedback from both the 2015 and 2017 survey was grouped and analyzed by thematic analysis. Since the format of the feedback was flexible, the data analysis was mostly inductive, drawing codes and themes from the data. Themes that appeared often in the text were highlighted in the report. Thematic analysis of the open feedback gave an idea of how the respondents perceived maturity models in the context of their company beyond the quantitative results.

As for the thematic analysis carried out in the study reported in Pub- lication IV, the approach resembled that of the first two studies. Tran- scripts from interview recordings were coded using an integrated approach to determine how the respondents understood refactoring and how they saw refactoring taking place in their projects. The interview recordings for the study were mostly written out by professionals from third party transcrip- tion service providers, which added more detail to the transcripts compared to researcher notes. The themes of the questions in the semi-structured interview provided the framework for the thematic analysis but the coded responses gave form to the content of the analysis. Several researchers took part in the coding so the same transcript or parts of it could be coded by multiple researchers. Clarification requests were sent on occasion to the

respondents if the original interview and the interview transcript failed to provide a sufficient answer to the overall questions posed.

Publication V provided results from a study that took advantage of thematic analysis. The pattern of analysis was similar to the other studies in that the recorded interviews were transcribed by professionals and later coded by the researchers. Coding was facilitated by a qualitative research analysis tool named Atlas.ti to which the detailed transcripts were loaded.

Searching for codes and themes related to the adoption of DevOps was an inductive effort, relying on the interview data. Codes and themes were grouped to a thematic map that highlighted the main themes of the study.

The discovered themes were finally used to organize the results in the study report,

Besides qualitative data analysis and thematic analysis, quantitative data analysis methods were used for the survey study results reported in Publication III. Statistical analysis of quantitative data can be used to com- pare between-sample means (Blaikie, 2011). The null hypothesis in the case is that there is no difference in the continuous deployment maturity of the case company projects as measured in 2015 and 2017. Because the survey utilized an ordinal scale in the survey questions where the respondents selected the most suitable maturity level for a process area, only non-parametric statistical tests are applicable. Parametric tests like the t test need the data to be normally distributed and thus parametric tests cannot be applied to non-metric data (Blaikie, 2011).

Instead, the non-parametricMann-Whitney U test that fits ordinal data (Blaikie, 2011) was used to test for statistical significance from the survey responses. The statistical test was applied for the response means in the process areas of test automation, quality, build and deployment, and running and monitoring, taking into account the response means of both years 2015 and 2017, respectively. A metric for theU value is the result of Mann- Whitney U test that was in the end used to determine statistical significance based on lookup values of the metric.

No documento Increasing Release Frequency by Accelerating Software Development Cycles in Software Engineering (páginas 60-63)