Visual Programming Language for Orchestration with Docker

(1)

F

ACULDADE DE

E

NGENHARIA DA

U

NIVERSIDADE DO

P

ORTO

Visual Programming Language for

Orchestration with Docker

Bruno Piedade

Mestrado Integrado em Engenharia Informática e Computação Supervisor: Filipe Figueiredo Correia, Assistant Professor

Second Supervisor: João Pedro Dias, Guest Assistant

(2)

c

(3)

Visual Programming Language for Orchestration with

Docker

Bruno Piedade

Mestrado Integrado em Engenharia Informática e Computação

(4)

(5)

Abstract

With the widespread of cloud-based infrastructure and microservice architectures along with De-vOpspractices, development and operations have become fundamentally more intertwined. As a response to the ever-changing requirements, a variety of tools emerged and among them, con-tainers, were fundamental in providing a more light-weight alternative to virtualization. Docker is currently one of the most adopted and used solutions for container implementation and manage-ment in the software industry and Docker Compose supports the orchestration of multi-container applications via text-based configurations files.

Even though the orchestration of simple architectures may be straight-forward, more advanced concepts such as volumes for persistence storage of data can appear daunting for inexperienced developers. Furthermore, the text-based nature of existing solutions allows naive mistakes and decreases the readability of orchestration configurations as their complexity increases, either in number or heterogeneity of services.

Visual programming approaches have been used to handle working with abstractions for domain-specific or general-purpose programming. We can already find numerous instances of visual pro-gramming approaches in the operations field. In particular, some which support Docker Compose orchestration. However, these approaches are incomplete since the adopted visual notations do not fully capture the underlying concepts of Docker Compose and therefore may fail to maximize the potential gains of such an approach. Furthermore, there seems to be a lack of adoption of these alternative approaches as suggested by the results of a survey we conducted to gauge what challenges developers face when working with these technologies.

Thus, we see the definition of a complete visual programming approach for specifying and vi-sualizing Docker-based architectures may provide a higher degree of abstraction with the potential of easing the developer’s efforts and reducing the error rate and development time.

To this end, we developed a prototype named Docker Composer, functioning as the program-ming environment for the complete visual approach, by leveraging knowledge from visual pro-gramming and model-driven engineering. The prototype addresses the limitations of state-of-the-art solutions by featuring rich visual notations that encompass all of the Docker Compose state-of-the-artifacts and simplifies the management of stacks in a single environment.

The prototype was then used as a means to validate the approach in a controlled experiment conducted among novice developers. The study considered two treatments and aimed to com-pare the prototype with the conventional toolchain in regards to performance and the perception of ease of use, usefulness, and intention to use. The results indicate that the prototype presents some benefits in reducing the development time and error-proneness, primarily for stack definition activities, and provides a more streamlined development experience, supporting our hypothesis. Furthermore, the participants found the prototype easier to use, considered it useful, and mani-fested willingness to use it in the future.

(6)

(7)

Resumo

Com a disseminação da arquitetura baseada em nuvem e arquiteturas de microsserviços, junta-mente com as práticas DevOps, o desenvolvimento e operações tornaram-se fundamentaljunta-mente mais interligados. Como resposta aos requisitos em constante mudança, surgiram várias ferra-mentas e, entre elas, containers, foram fundamentais para fornecer uma alternativa mais leve à virtualização. Atualmente, Docker é uma das soluções mais adotadas e usadas para implemen-tação e gestão de containers na indústria de software e Docker Compose suporta a orquestração de aplicações com vários containers por meio de ficheiros de configurações textuais.

Embora a orquestração de arquiteturas de baixa complexidade possa ser simples, conceitos mais avançados, como volumes para armazenamento persistente de dados, podem parecer intim-idantes para programadores inexperientes. Além disso, a natureza baseada em texto das soluções existentes permite erros ingénuos e diminui a legibilidade das configurações de orquestração à medida que sua complexidade aumenta, tanto em número quanto em heterogeneidade de serviços. As abordagens de programação visual têm sido usadas para trabalhar com abstrações para programação específica de domínio ou de uso geral. Podemos já encontrar inúmeras instâncias de abordagens de programação visual no campo de operações. Em particular, alguns que ofere-cem suporte à orquestração com Docker Compose. No entanto, estas abordagens são incompletas, pois as notações visuais adotadas não capturam completamente os conceitos subjacentes a Docker Compose e, portanto, podem não maximizar os potenciais ganhos de tal abordagem. Além disso, parece haver uma falta de adoção destas abordagens alternativas, conforme sugerido pelos resulta-dos de uma investigação que realizamos para avaliar os desafios que os programadores enfrentam ao trabalhar com estas tecnologias.

Desta forma, acreditamos que a definição de uma abordagem de programação visual completa para especificar e visualizar arquiteturas baseadas em Docker pode fornecer um maior grau de abstração com potencial de facilitar os esforços do programador e reduzir a taxa de erro e o tempo de desenvolvimento.

Neste sentido, desenvolvemos um protótipo denominado Docker Composer, que funciona como ambiente de programação para a abordagem visual completa, com base em conhecimen-tos das áreas de programação visual e de engenharia orientada a modelos. O protótipo aborda as limitações das soluções de estado da arte, apresentando notações visuais ricas que abrangem todos os artefactos de Docker Compose e simplifica a gestão de stacks num único ambiente.

O protótipo foi então utilizado como meio para validar a abordagem numa experiência con-trolada realizada com programadores principiantes. O estudo considerou dois tratamentos e teve como objetivo comparar o protótipo com o conjunto de ferramentas convencionais em relação ao desempenho e à perceção de facilidade de uso, utilidade e intenção de uso. Os resultados indicam que o protótipo apresenta alguns benefícios na redução do tempo de desenvolvimento e propensão a erros, principalmente para atividades de definição de stacks, e fornece uma experiência mais fluída, o que suporta a nossa hipótese. Além disso, os participantes consideraram o protótipo mais fácil de usar, útil e manifestaram vontade de usá-lo no futuro.

(8)

(9)

Acknowledgements

I thank my supervisor, professor Filipe Correia and my second supervisor, professor João Dias for their guidance, collaboration, and criticism. Without their knowledge and ideas, this dissertation would have not been the same.

My deepest gratitude to my family for their patience and support in my decisions through all the years. I am who am today because of them.

Lastly, but not least, I thank everyone, colleagues and friends, who accompanied me through-out this academic journey.

Bruno Piedade

(10)

(11)

“Imagination means nothing without doing”

Charlie Chaplin

(12)

(13)

x CONTENTS 4.8 Conclusions . . . 46 5 Problem Statement 49 5.1 Current Issues . . . 49 5.2 Research Statement . . . 50 5.3 Target Audience . . . 51 5.4 Solution Perspective . . . 51 5.5 Methodology . . . 52 6 Solution Prototype 53 6.1 Overview . . . 53 6.2 Architecture . . . 54 6.3 Technological Decisions . . . 55 6.4 Feature Design . . . 56 6.4.1 Visual Map . . . 58 6.4.2 Static Validation . . . 59 6.4.3 Supported Versions . . . 60

6.4.4 File Management and Serialization . . . 60

6.4.5 Executing Commands from the UI . . . 61

6.4.6 Visual Feedback . . . 61

6.4.7 Docker Hub Integration . . . 62

6.5 Practical Example . . . 62 6.6 Availability . . . 62 6.7 Discussion . . . 63 7 Empirical Study 65 7.1 Goals . . . 65 7.2 Design . . . 66 7.2.1 Participants . . . 66 7.2.2 Data Sources . . . 67 7.2.3 Environment . . . 67 7.2.4 Task Definition . . . 67 7.2.5 Procedure . . . 69 7.2.6 Data Collection . . . 69 7.2.7 Data Analysis . . . 71 7.2.8 Pilot Experiments . . . 71 7.2.9 Replication . . . 71 7.3 Data Analysis . . . 72 7.3.1 Background . . . 72 7.3.2 Task Performance . . . 75 7.3.3 Assessment Questionnaire . . . 84 7.4 Validation Threats . . . 91 7.5 Summary . . . 92

8 Conclusions and Future Work 93 8.1 Hypothesis Revisited and Contributions . . . 93

8.2 Future Work . . . 94

(15)

CONTENTS xi

B Preliminary Work Questionnaire 99

C User Study Materials 105

C.1 Control . . . 106 C.2 Experimental . . . 119

(16)

(17)

List of Figures

1.1 High-level architectural overview. . . 3

2.1 Comparison of hypervisor and container-based deployments. . . 7

2.2 Visual programming and software visualization. . . 8

2.3 A program and its dataflow equivalent. . . 11

2.4 Sample of the Scratch programming environment UI. . . 13

2.5 Sample of the FlowHub programming environment UI. . . 14

2.6 Example of a Node-RED flow. . . 15

2.7 Diagram of models in software engineering . . . 16

3.1 Venn diagram of technology groups in Operations . . . 25

3.2 Sample of Grafana’s dashboard. . . 26

3.3 Sample of Portainer’s main dashboard. . . 29

3.4 Sample from Admiral’s visual orchestrator. . . 30

3.5 Sample from Admiral’s template visual orchestrator. . . 31

3.6 Sample of CloudSoft Visual Composer’s interface. . . 35

3.7 Sample application map from CloudMap’s interface. . . 36

4.1 Distribution of answers to the Personal Context question group . . . 42

4.2 Distribution of answers to the Personal Context question group . . . 43

4.5 Distribution of participants who use plugins/tools for Docker Compose . . . 46

6.1 Deployment diagram of the prototype. . . 54

6.2 High-level architecture of the prototype. . . 54

6.3 Layout of the prototype’s main view. . . 57

6.4 Visual representation of a service artifact node. . . 59

6.5 Visual representation of a volume, network, config and secret artifact nodes. . . . 59

6.6 Static validation notation example. . . 60

6.7 Comparison of the textual and visual representation of a stack . . . 63

7.1 Distribution of Docker Compose YAML files on Github by size . . . 68

7.2 Distribution of used orchestration frameworks by group. . . 74

7.3 Distribution of configured Docker Compose artifacts by group. . . 74

7.4 Distribution of completed tasks by group. . . 76

7.5 Distribution of the global times for each subject by context, by group . . . 77

7.6 Distribution of times to completion for each subject by task, by group. . . 79

7.7 Distribution of execution attempts for each subject by task, by group. . . 81

7.8 Distribution of global context switches for each subject by group. . . 82

7.9 Percentage of subjects who answered the questions of task T1 correctly. . . 83

(18)

xiv LIST OF FIGURES

7.10 Mean of answers to related to environment factors for each subject, by question . 85

7.11 Mean of the answers to the PPU items of the assessment questionnaire . . . 86

7.12 Mean of answers to the PEOU items for each subject . . . 86

7.13 Mean of answers to the feature usefulness items for each subject . . . 88

7.14 Mean of answers to the PU items for each subject . . . 89

(19)

List of Tables

3.1 Monitoring tools comparative overview. . . 27

3.2 Service level visual tools comparative overview. . . 34

3.3 Infrastructure level visual tools comparative overview. . . 37

4.1 Summary of the answers to the Personal Context question group . . . 42

4.2 Distribution of the answers to the Personal Context question group . . . 42

4.3 Distribution of the reading related answers to the WwDT question group . . . 44

4.4 Distribution of the writing related answers to the WwDT question group . . . 45

6.1 Summary of supported features by drawing framework . . . 56

6.2 Summary of supported visual notations for Docker Compose artifacts by tool . . 63

7.1 Summary of the answers to background questionnaire . . . 73

7.2 Summary of the number of orchestration frameworks used . . . 73

7.3 Results of the McNemar test for used Docker Compose artifacts . . . 75

7.4 Summary of the global task times by activity . . . 77

7.7 Summary of the times for each task across both groups . . . 79

7.8 Summary of the execution attempts for each task across both groups . . . 80

7.9 Results of MW-U test for global context changes . . . 82

7.10 McNemar test for comparison of the answers of task T1 . . . 83

7.11 Summary of the answers to the ENV items of the assessment questionnaire . . . 84

7.12 Summary of the answers to the PPU items of the assessment questionnaire . . . . 85

7.13 Summary of the answers to the PEOU items of the assessment questionnaire . . . 87

7.14 Summary of the answers to the feature items of the assessment questionnaire . . 88

7.15 Summary of the answers to the PU items of the assessment questionnaire . . . . 89

7.16 Summary of the answers to the ITU items of the assessment questionnaire . . . . 90

A.1 Monitoring tools source . . . 97

A.2 Service tools sources . . . 97

A.3 Infrastructure tools sources . . . 98

(20)

(21)

Abbreviations

AI Artificial Intelligence

API Application Programming Interface CLI Command Line Interface

CaaS Containers as a Service DSL Domain Specific Language

DSML Domain Specific Modeling Language GUI Graphical User Interface

IaC Infrastructure as Code IaaS Infrastructure as a Service IoT Internet of Things

LHS Left Hand Side M2M Model-to-model M2C Model-to-model

MDA Model Driven Architecture MDE Model Driven Engineering

MDSE Model Driven Software Engineering MW-U Mann-Whitney U

PIM Platform Independent Model PLC Programmable Logic Controllers PSM Platform Specific Model

PaaS Platform as a Service QVT Query/View/Transformation RHS Left Hand Side

SaaS Software as a Service UML Unified Modeling Language VM Virtual Machine

VMF Visual Programming Framework VP Visual Programming

VPL Visual Programming Language YAML YAML Ain’t Markup Language

(22)

(23)

Chapter 1

Introduction

1.1 Context . . . 1 1.2 Problem Definition . . . 2 1.3 Motivation . . . 3 1.4 Main Goals . . . 3 1.5 Contributions . . . 4 1.6 Dissertation Structure . . . 4

The goal of this chapter is to present an overview of this dissertation. It starts with the description of context in which this dissertation fits (Section1.1), followed by the definition of the problem that it tries to tackle (Section1.2), the motivation behind the solution (Section1.3), the proposed main goals (Section1.4) and contributions (Section1.5). Lastly, an outline of how the remainder of the dissertation is structured is given (Section1.6).

1.1 Context

As the user-base and complexity of applications grow the infrastructure upon which they are built grows accordingly. This, in turn, results in huge challenges in managing and scaling such infras-tructure. DevOps has emerged in the software engineering ecosystem as a set of practices with the goal of combining development and operations seamlessly, shortening development life cycles, and providing continuous deployment all the while ensuring high software quality [26]. At the same time, cloud technologies have become commonplace, following the advent of cloud service providers such as Amazon Web Services (AWS)1, Microsoft Azure2and Google Cloud Platform3

1_{More information available at}_{https://aws.amazon.com/} 2_{More information available at}_{https://azure.microsoft.com/} 3_{More information available at}_{https://cloud.google.com/}

(24)

2 Introduction

offering unprecedented flexibility and deployment velocity through diverse hosting options suit-able to satisfy a broad variety of consumer requirements and needs [7]. Automation is a core principle of DevOps, targeting, among other challenges, infrastructure management [26]. Infras-tructure as Code (IaC) is the practice of managing infrasInfras-tructure through configuration files as part of the code-base [34]. Although initially designed for configuring and managing bare-metal infrastructure (popularized by configuration management tools such as Chef4 and Puppet5), this practice is also used nowadays for managing and provisioning infrastructure resources in cloud environments.

Alongside these developments, another key aspect was the appearance of containers, revolu-tionizing the way systems are structured and enabling microservices architectures [3]. Containers allow the virtualization of services and applications in a more lightweight way when compared to virtual machines [38]. One of the key container implementation and management technologies is Docker which currently plays a massive role in this context [51]. With the ever-growing scale and complexity of the software engineering world, the need for efficient and useful tools for supporting the developers whose daily job is to manage such systems and optimizing these tasks is critical.

1.2 Problem Definition

Docker Compose6 is a tool for defining and running multi-container Docker applications. Usu-ally, the developer defines the intended configuration by editing a YAML file containing all the information regarding the services that constitute the application, the corresponding images, and how they are related to each other as well as volumes for data persistence and networks for the connections between services. The stack can then be run, conventionally through the command-line interface (CLI), resulting in the creation or execution of the declared resources, including the container or set of containers for each of the declared services.

Although this process may be fairly straight-forward for the setup of low complexity service stacks, the textual nature of the configuration files may present some challenges as the complexity increases, be it in number or heterogeneity of services. In such cases, it is more clear that un-derstanding dependencies between services becomes difficult as definitions begin to get scattered within the file. Furthermore, some advanced aspects of the configuration such as port mapping and volume management might be overwhelming and confusing for inexperienced users.

A complete visual approach for editing and visualizing such configurations might be valuable in aiding the developer in successfully setting up comprehensive and complex applications by providing a higher degree of abstraction in a friendlier and more intuitive environment. Thus, we believe the increased understandability provided by a complete visual approach [14] may result in higher efficiency, particularly, by speeding up development time and reducing error proneness. In this sense, a complete visual approach has the potential to prove useful for a broad audience of

4_{Chef, available at}_{https://www.chef.io/} 5_{Puppet, available at}_{https://puppet.com/}

(25)

1.3 Motivation 3

end-users ranging from first-time developers who wish to understand how the technology works to more experienced users who might take advantage of the visualization aspects to have a clearer overview of the configuration.

1.3 Motivation

Historically, visual approaches have been used for numerous purposes, for example, in manufac-turing industries when configuring programmable logic controllers (PLC) via ladder and sequential function charts and usage of visual notations in software engineering such as the Unified Model-ing Language (UML) while more recent applications can be found for educational purposes and in the Internet of Things (IoT) area [23]. In fact, there are already numerous examples of visual approaches found in the operations field for multiple purposes and tasks, for instance, the manage-ment of cloud and container resources including some which focus on Docker technologies [46]. In this sense, there is evidence supporting the viability and usefulness of such an approach.

Additionally, there is currently a high concern with misconfigurations resulting from IaC scripts following the widespread of this process to automate infrastructure provisioning and man-agement [53]. Although Docker Compose configurations files do not fit precisely in this definition, they are very similar to this notion, albeit applied to service orchestration. The previously identi-fied benefits of visual approaches can be crucial in alleviating such misconfigurations.

1.4 Main Goals

Figure 1.1: High-level architectural overview.

The overall purpose of this dissertation is to explore and research the benefits of a visual programming approach applied to the orchestration of service stacks based on Docker Compose by evolving and expanding the efforts already made in this field. With this objective in mind, we propose to design a more complete visual programming approach including the definition of the associated graphical elements and visual notations for the artifacts and relations inherent to Docker Compose. Next, the development of a prototype for leveraging the language. Finally, we propose an experimental design for the validation and assessment of the practical usefulness of

(26)

4 Introduction

the proposed visual approach using the prototype to conduct experiments with the end-users by measuring the efficiency (e.g. time of development and number of errors made by the developers) between solutions.

Figure1.1shows a high-level perspective of the envisioned architecture. As can be observed, the solution will leverage knowledge mainly from two fields: Visual Programming (VP) and Model-Driven Engineering (MDE). Due to this, these areas are thoroughly explored in the fol-lowing two chapters.

1.5 Contributions

The contributions of this dissertation are threefold:

• A study of the challenges of working with Docker Compose. A study conducted with students to identify practical issues developers find when working with Docker technologies;

• A visual programming environment for orchestration with Docker Compose. A proto-type serving as a visual programming environment was developed leveraging the proposed complete visual approach for orchestration with Docker Compose;

• An experimental design to validate the approach. We have designed and conducted a user study among students to empirically validate the complete visual approach.

1.6 Dissertation Structure

In this section, the remainder of the dissertation structure is outlined as follows:

Chapter2, Background, compiles a collection of the most relevant concepts and issues that are essential in understanding the rest of the dissertation.

Chapter3, State of the Art, provides a review of the state-of-the-art related to the context of this work.

Chapter4, Preliminary Work, describes the research conducted to identify the issues devel-opers face when working with Docker Compose technologies.

Chapter5, Problem Statement, presents a focused and detailed view of the problem addressed by this dissertation and solution prospects including the hypothesis and research questions.

Chapter6, Solution Prototype, details the implemented prototype which illustrates the pro-posed solution to the problem.

Chapter7, Empirical Study, documents the experimental procedure and showcases the find-ings for the user study conducted to validate the solution prototype.

Chapter8, Conclusions and Future Work, briefly summarizes the main conclusions derived from the research along with the main contributions and proposed future work.

(27)

Chapter 2

Background

2.1 Cloud Computing and Infrastructure . . . 5 2.2 Virtualization and Containerization . . . 7 2.3 Software and Program Visualization . . . 8 2.4 Visual Programming Languages . . . 9 2.5 Model-driven Software Engineering . . . 16

This chapter has the main purpose of defining key concepts, essential to the understanding of this work and upon which it is built. The concepts are as follows: cloud computing and infrastructure (Section2.1), virtualization and containerization (Section2.2), software visualization (Section2.3, visual programming languages (Section2.4) and model-driven software engineering (Section2.5).

2.1 Cloud Computing and Infrastructure

Cloud computing is a broad concept that has risen as a paradigm for hosting and delivering services over the Internet [1]. In more detail, it can be defined as the combination of the Software as a Service (SaaS) delivered over the Internet and the hardware and systems software data-centers that provide those services, known as a cloud [7]. According to Abbasov [1], cloud computing has five fundamental characteristics:

• On-demand self-service: A user should be able to acquire resources automatically without human interaction with the service provider.

• Broad network access: The resources of the cloud should be available through the Internet and accessed through diverse terminals.

(28)

6 Background

• Resource pooling: The physical and virtual computing resources (e.g. storage, processing) are abstracted from the end-user and assigned as needed. The end-user has no control and knowledge over the resources and may only be able to specify a location.

• Rapid elasticity: Resources scale elastically on demand giving the illusion of being infinite and can be provisioned in any quantity at any time.

• Measured service: Resources are automatically controlled and optimized using appropriate metering capabilities and its utilization is transparent for both the provider and consumer.

Another important characteristic of cloud computing is the type of services provided which range in a large spectrum based on resource abstraction [1]. These include the following:

Software as a Service (SaaS). The infrastructure is abstracted from the consumer removing control at this level. It aims to provide a hosting environment suitable for the deployment of complete applications accessible through the Internet from diverse terminals. Management is done in a single virtual environment for the optimization of resources in regards to availability, speed, security, maintenance, and disaster recovery.

Platform as a Service (PaaS). Development platform for the full "Software Lifecycle". It differs from SaaS because it provides support for both complete and in-progress applications.

Infrastructure as a Service (IaaS). Provides the consumer with direct control of resources at the infrastructure level (e.g. processing and storage). Relies heavily on virtualization for integra-tion and decomposiintegra-tion of physical resources to accommodate demand.

Finally, in regards to the deployment model, whose definition is based on the location of the infrastructure and the entity responsible for managing it, this may be one of four [1]:

Private cloud. Operated by a single organization regardless of location. Reasons for adoption include, among others, utilizing and optimizing in-house resources and ensuring data privacy.

Community cloud. Shared between multiple organizations under the same infrastructure and core values. Can be hosted by a third-party or a member of the community.

Public cloud. Most common model. Allows consumers to pay on-demand based on the provider’s policies and charging-model.

Hybrid cloud. A combination of two or more of the previous clouds connected by technolo-gies that enable data and application portability.

One of the biggest benefits of cloud computing is its elasticity and pay-by-usage model, which adapts to the needs of its consumers. This provides companies of all sizes the opportunity of experimenting and take risks without the heavy burden of managing all required infrastructure and hardware behind it, shifting the focus from management and maintenance to the core business.

Along with the widespread of cloud computing technologies and concerns nourished by the DevOps community, the Infrastructure as Code (IaC) practice emerged. It is the practice of main-taining system configurations and provisioning deployment environments using source code [34].

(29)

2.2 Virtualization and Containerization 7

Figure 2.1: Comparison of (a) hypervisor and (b) container-based deployments. From [10]

IaC scripts are handled much like the rest of the code-base and promote the participation of devel-opers in operations.

2.2 Virtualization and Containerization

As established in the previous section, cloud computing presents several difficult challenges for the providers of these technologies. For instance, achieving the proposed scaling elasticity and seemingly infinite capacity requires the virtualization of the resources to hide the implementation of how they are multiplexed and shared [7]. In fact, major aspects are heavily dependent on virtualization. Current solutions include virtual machines (VMs) and containers.

Containers are lightweight and executable standard units of software which package up code and all its dependencies [59]. Although containers and VMs try to solve an identical issue and are indeed very similar, the main difference lies in the virtualization level utilized [48]. VMs aim to emulate hardware while containers emulate operating systems, making them more lightweight and portable all the while keeping the same resource isolation and allocation benefits [38]. In fact, some sources claim the benefits of using containers in comparison to VMs, most notably, in regards to the throughput and response time [40,38].

Figure 2.1 displays the differences in the deployment of an application between hypervisor (VM) and containers.

It is in this context that Docker currently stands as one of the most adopted container imple-mentation and management technology [51]. Containers play an important role nowadays from development to production and are crucial in enabling microservices architectures [3].

It is for this exact reason why Docker, more accurately Docker Compose, was chosen as the container technology for which to develop the visual orchestrator being proposed in this work,

(30)

8 Background

Figure 2.2: Visual programming and software visualization. From [24]

that is, its current high adoption and relevancy in the software development and engineering space. Taking into account existing container technologies, when compared to alternatives, Docker stands as the most pertinent.

2.3 Software and Program Visualization

A substantial amount of research is found in the area of program visualization and the broader area of software visualization. The distinction between both lies in what artifacts are considered. In software visualization, besides the source code itself, other artifacts, namely requirements, architectural design, and bug reports are also taken into account [24].

Software visualization and visual programming (discussed in the next section) are strongly re-lated and complement each other, as shown in Figure2.2. While the former usually produces static visualizations for software systems the latter allows visual manipulation of elements to generate software systems. When applied simultaneously round-trip visualization is achieved [24].

In general, these visualization approaches have the overall intent of increasing the compre-hensibility of software systems by providing a method to visualize the code and its relationships usually following some specific visual metaphor, resulting in a higher abstraction level by translat-ing difficult concepts into more comprehensible equivalents. Furthermore, some explore liveness concepts [62,2] in an effort to improve the feedback provided to the developer. This approach is particularly useful when applied to complex and large-scale systems for which a global view may be valuable in understanding the system as well as debugging.

One of the most widespread is the Unified Modeling Language (UML). UML has been the de facto standard for, among other purposes, visualizing software architectural designs and arti-facts [11].

(31)

2.4 Visual Programming Languages 9

Another example is Cloudcity that aims to explore the benefits of model-driven engineering techniques and a combination of live programming and software visualization approach applied to cloud management through a city metaphor [43].

More software visualization approaches have adopted a city metaphor in the past, following in CodeCity’s footsteps, originally developed by Wettel et al. The city metaphor was chosen after several empirical studies validated its adequacy and efficiency in software visualizations. According to this metaphor, software elements are mapped to city elements such as buildings and districts. Additional efforts include CityVR, which experimented with virtual reality (VR) integration to provide a higher engagement level [43,44,4].

2.4 Visual Programming Languages

Although the definition of a visual programming language (VPL) can be broad, vague and even somewhat contradictory among distinct authors throughout its evolution, the main agreed-upon characteristics are the reliance on and usage of graphical elements and visual notations, such as icons and diagrams, for both conveying information and serving as the interaction medium with the developer applied to some application, with the overall purpose of providing an abstraction over some programming task. The major distinction between visual and text-based programming languages lies in the exploration of multiple dimensions for semantic expression of the former when compared to the latter [14,15,50]. Examples of such added dimensions include time rela-tionships expressing "before-after" relations and spatial relarela-tionships.

Even though this definition may apparently imply the elimination of text on the surface, in reality, this is a misconception as the overall goal is to strive for improvements in programming language design in a multidimensional context. In fact, most VPLs include some textual elements as an auxiliary dimension to provide a more complete and comprehensible view. Nevertheless, to be classified as such, VPLs require significant parts of the program structure to be represented graphically [14,25].

2.4.1 Key Concepts

We’ll start by presenting and discussing some key concepts, characteristics, and features which are shared across literature.

Purpose. VPLs are split between two purposes. General-purpose providing a visual approach for software development, suitable for producing executable programs of reasonable size, with equivalent freedom of a high-level text-based programming language and domain-specific when applied to a specific area for a single or set of tasks (e.g. software engineering or scientific visual-ization).

Visualization metaphor. A crucial characteristic of a VPL is its adopted visualization metaphor, that is, how the domain concepts are mapped from the internal model to their corresponding vi-sual representations [8,24]. The overall goal of applying a visualization metaphor is to increase comprehensibility over otherwise foreign and difficult notions by transposing these concepts into

(32)

10 Background

more understandable analogous concepts which are more familiar to the end-user. One example is the city metaphor used for code visualization, mentioned previously.

Representation model. This is a broad topic, ranging from dimensionality, whether 2D or 3D, to how the elements are distributed, organized, and connected on the screen, such as graph-based and box-based diagrams.

Control flow. As in traditional text-based programming languages, VPLs follow one of two concepts for flow of control: imperative and declarative. In a imperative approach one or more control-flow or dataflow diagrams are used to convey how the thread of control flows through the program, being particularly useful for representing parallelism. On the other hand, in a declarative approach, the developer is only concerned with what computations are performed and not how the operations are carried out, avoiding explicit state modifications by the use of single assignment.

Abstraction. Two abstraction types are widely adopted: procedural abstraction and data abstraction. On one hand, in regards to procedural abstraction, this type can be further subdivided into two levels, high and low level. High-level VPLs are commonly found in domain-specific applications but are not complete programming languages, meaning, it is not possible to write and maintain an entire program from the ground up with these languages without the combined use of additional underlying non-visual modules. In contrast, low-level languages, do not allow the developer to combine fine-grained logic into procedural modules and are useful in domain-specific applications as well. General-purpose VPLs usually feature both levels of abstraction to best suit the distinct concepts of a general programming language. On the other hand, data abstraction is exclusively applied in general-purpose VPLs and its definition is very similar to the one when applied to conventional programming languages, that is, the idea of simplifying a body of data into a reduced, yet more comprehensible, representation. However, when applied to the concept of a VPL, this type of abstraction includes the added restrictions of being defined visually, have a visual representation, and provide interactive behavior.

As previously established, the main purpose of VPLs is to provide an abstraction over some programming tasks and allow the developer to work at this higher abstraction level, theoretically presenting benefits when compared to traditional text-based programming. In particular, according to Burnett [14], the most common goals include increasing the understandability for a certain target audience, reducing error proneness, and increasing the development speed. This can be achieved by exploring four common strategies:

• Concreteness. Allow the direct and visual exploration of data. One example is the effects of some portion of a program being automatically displayed on a specific object or value.

• Directness. Reducing the number of steps between the intention and the goal. In practice, it equates to minimizing the sequence of interactions required for the user to achieve some objective. One example is the difference between having to navigate through multiple menus in comparison to using a keyboard shortcut for the same task.

• Explicitness. Explicitly represent more data without the need of inferring it. For instance, considering a graph-based visual notation and 3 elements a, b and c which follow a transitive

(33)

Figure 2.3: A program and its dataflow equivalent. From [37]

relation R such that if a R b and b R c, then a R c, the distinction lies between explicitly displaying the relation between a and c and not.

• Immediate Visual Feedback. Automatically display up-to-date information whenever some change occurs. This refers to the feedback loop between the programming environment and the developer, therefore, it can be directly associated with liveness levels as defined by Tan-imoto [62] according to Johnston et al. [37].

2.4.2 Categories

A very important subset of VPLs is those which follow the Dataflow Model [31,37]. Its origin dates back to the 1970s and was originally intended to exploit parallelism [37]. In this model, a program is seen as a directed graph in which nodes correspond to primitive instructions or func-tions (e.g. arithmetic operafunc-tions) and arches represent data dependencies between the instrucfunc-tions upon which units of data called tokens flow behaving as unbounded queues. Inbound arches cor-respond to the input of a function, while outbound corcor-respond to the data output. When all of its inputs contain data, a function node, considered as fireable, is executed some undefined time af-terward and the output is placed in some or all of its outbound arches. The main advantage of this model lies precisely in its parallelism potential as multiple nodes can be fireable simultaneously. Programming languages which follow this paradigm usually adopt a graph-like representation. Figure2.3showcases a program snippet (a) and its equivalent in a dataflow notation (b).

During the 1990s, a surge of VPLs appeared [37] and its impact can still be felt today, as seen with many recent VPLs, built for a wide array of tasks and domains. The representation as a graph is very flexible and a fitting way of visually representing many concepts making it useful in many applications whether for general-purpose purposes and domain-specific languages.

(34)

12 Background

Throughout the evolution of VPLs, multiple classification schemes have been proposed. One of the most recent is the scheme proposed by Boshernitsan et al. [12] which results from the com-bination and refinement of previous efforts made by important authors in the field, such as Chang [20], Shu [58], and Burnett [16]. This scheme will be adopted henceforth for all classifications of VPLs made during this work, particularly in the state-of-the-art review (Chapter3). According to these authors, based on their characteristics, including provided features and used paradigms, VPLs can be organized in five non-mutually exclusive categories. Most VPLs belong to one of the first two categories and both general-purpose and domain-specific languages can be found in each category.

Purely visual languages. These VPLs are characterized by their strong reliance on visual techniques, that is, interaction through the manipulation of visual elements. Additionally, they allow direct compilation from the visual model, without the need for translation to some interme-diate text-based language and support debugging and execution in the same environment.

Hybrid text and visual languages. Mix of text-based and visual languages. Considers both languages which are fundamentally text-based but include complementary graphical elements and fundamentally visual-based (similarly to purely-visual) but are afterward converted into a high-level text-based language. Furthermore, some support edition of a program while alternating be-tween two views for the visual and textual representation while maintaining consistency bebe-tween both.

Programming-by-example systems. Follows the programming-by-demonstration paradigm allowing the user to create and manipulate visual objects to teach the system how to perform tasks. This category considers two types: Programming by example systems that try to infer the program from examples of input and output and Programming with examples systems that remember programmer’s commands for later use, without inferring anything [50].

Constraint-oriented systems. Act on constraint scenarios or environments by the definition of a set of rules. Examples include simulation design and graphical user interface (GUI) develop-ment.

Form-based systems. Derived from spreadsheets for its visualization and programming pa-radigm. Programming is done by changing a set of interconnected cells over time. Although not explicitly specified in the definition by Boshernitsan et al., this category can be seen in a more broad perspective to include languages which follow the Form-based and spreadsheet-based paradigm in general [16]. This means that, besides languages that adopt a spreadsheet metaphor, similar languages that generalize sheets into forms are also considered.

To more clearly illustrate the characteristics and concepts inherent to each VPL category as es-tablished in the scheme proposed by Boshernitsan et al. [12], some examples are briefly discussed. To note, even though much work can be found during past decades towards compiling surveys in regards to this area, to the best of our knowledge, there is currently a lack of up-to-date examples covering more recent advances in this field. As a result, the VPLs presented in this section were chosen based on their popularity and relevancy. In particular, one or a combination of ranking

(35)

Figure 2.4: Sample of the Scratch programming environment UI. From the project Red Cube available athttps://scratch.mit.edu/projects/361704410

metrics on the hosting website, such as Github1’s star ranking system, when such a metric was available. Otherwise, the criteria for the inclusion of the tool was the novelty of its approach or metaphor adopted. In addition, some grey literature was also considered for review.

Purely visual languages. Scratch2 is an educational visual programming environment that lets users create interactive and media-rich projects, primarily aimed at teaching young people programming concepts [45]. It was developed by the MIT Media Lab and publicly launched in 2007. It is built upon a block-like interface that allows the connection of command blocks to express the program’s logic. Each block corresponds to some operator. The programming environment consists of a single-window split into four main panes: a command palette containing the available command blocks, a pane for the scripts, a staging area for the output, and a pane for all the sprites in the project. It also features additional panels accessible by folder tabs to view and edit the costumes and sounds owned by the selected sprite. It achieves level 4 liveness, allowing the users to interact with the system at any time while it is running, encouraging tinkerability, in other words, experimentation and self-discovery similar to how one may tinker with mechanical or electronic components [45]. Figure2.4displays a sample of the UI.

Hybrid text and visual languages. Blockly3is JavaScript library for VPL based on interlock-ing logic blocks similar to the previously mentioned Scratch. However, it matches more closely the syntax of traditional programming languages supporting common constructs such as if conditions and loops. It can output the corresponding code in multiple languages including JavaScript and Python. Many projects are built with Blockly mostly for educational purposes. One example is Micro:Bit4_{. This project introduces a micro-computer programmable through the block notation} or with a traditional scripting programming language, namely JavaScript and Python. Most no-tably, the web programming environment offered supports programming between the block-based

1_{https://github.com/}

2_{Scratch, available at}_{https://scratch.mit.edu/}

3_{Blockly, available at}_{https://developers.google.com/blockly} 4_{Micro:Bit, available at}_{https://microbit.org/}

(36)

14 Background

Figure 2.5: Sample of the FlowHub programming environment UI. From the React ToDo example available athttps://noflojs.org/example/

notation and the traditional text-based language in parallel for the same source code.

Another notable example is NoFlo5. NoFlo is an open-source JavaScript implementation of Flow-Based Programming (FBP), a programming approach reminiscent of dataflow program-ming [37, 49]. It is designed for general-purpose programming and applications can be found in various domains and purposes, ranging from IoT to multimedia and even controlling a drone. Figure2.5showcases an example of NoFlo’s FlowHub environment UI. In regards to the UI, some noteworthy design considerations include:

1. Customization of blocks through user-defined icons and names as well as descriptions for inputs and outputs, making the concepts more comprehensible, especially when interpreting an unknown project for the first time. However, even though a more detailed description for a component can be set in its definition, this description is not displayed in the corresponding details panel.

2. Inclusion of a simplified mini-map for displaying an overview for the complete layout of the diagram.

Finally, Node-RED6is one of the most widespread tools used in IoT, originally developed by IBM7and currently part of open source OpenJS Foundation8. It works as a Flow-Based Program-ming tool for wiring together hardware devices, APIs, and online services. Created flows can be exported and are stored in JSON format. Figure2.6displays an example of a flow.

Programming-by-example. Pygmalion is usually considered as one of the first languages in this category [12]. Additional examples include Chimera, Cocoa, and Rehearsal World [12, 14, 50].

5_{NoFlo, available at}_{https://noflojs.org/} 6_{Node-RED, available at}_{https://nodered.org/}

7_{Moreinformationavailableathttps://www.ibm.com/} 8_{Moreinformationavailableathttps://openjsf.org/}

(37)

Figure 2.6: Example of a Node-RED flow. Retrieved from Githubhttps://github.com/ node-red/node-red

Based on the conducted research and to the best of our knowledge, no recent VPLs can be found in this category.

Constraint-oriented systems. From a historical perspective, these systems have been mostly used for performing simulations, quintessential examples being ThingLab and ARK, and devel-oping GUIs [12]. A modern example is Bubble9designed for enabling non-programmers to build web applications. This tool includes, among other aspects, a GUI builder usable through a combi-nation of drag-and-drop mechanisms and form-based definitions.

Another recent example is IFTTT10, a web-based service for connecting other web services and some physical devices through simple conditional statements called applets. Common ser-vices include Gmail, Facebook, and Instagram. When creating an applet, the user configures a service and the trigger condition as well as the desired action to perform on the target service. Appletscan be shared among users of the application.

Form-based. Forms/3 is one of the progenitors in this category [12]. This general-purpose VPL follows the spreadsheet metaphor of cells and formulas to represent data and computation respectively [12].

To summarize, in the current landscape, among the visual programming language categories, purely-visual and hybrid constitute the vast majority and most are domain-specific languages rather than general-purpose. This is directly in line with practical experiences that support the hypothesis that visual approaches are most useful when applied to a particular field since it takes advantage of a clearer and direct communication style tailored for specific problems [14]. By a large margin, the most common visualization notations were some variation of a graph-based ap-proach. Moreover, recent advances in the education field have thoroughly explored block-based representations, partially due to open-source libraries such as Blockly that facilitate the develop-ment of tools in this programming style.

9_{Bubble, available at}_{https://bubble.io/} 10_{IFTTT, available at}_{https://ifttt.com/}

(38)

16 Background

2.5 Model-driven Software Engineering

Model-driven Software Engineering is a software engineering paradigm in which the whole de-velopment life-cycle is supported by high-level abstraction (models) representing different views of the system, abstracting the underlying infrastructure and code. These models can be converted in a (semi)automatic process (model transformation) between distinct abstraction levels from an initial high-abstraction model down to its executable equivalent [22]. The potential benefits of this approach include:

• Improved portability by separating the application knowledge from the specific implemen-tation technology.

• Increased productivity through automated mapping.

• Improved quality through the reuse of proven patterns and best practices.

• Improved maintainability due to a better separation of concerns.

• Better consistency and traceability between models and code.

Figure 2.7: Diagram of models in software engineering. From https://researcher. watson.ibm.com/researcher/files/zurich-jku/mdse-01.pdf

Figure2.7diagrams the concept of models in software engineering showcasing how a model abstracts low-level concepts (usually code) but itself is an abstraction of a view in the real world.

Numerous initiatives can be found in the domain of MDSE. One of the most notable is Model Driven Architecture (MDA) by the Object Management Group (OMG)11. The MDA standard specifies the usage of the MetaObject Facility (MOF) language for modeling high-level Platform-Independent Models (PIMs) which are mapped to low-level Platform Specific Models (PSMs) used to generate artifacts usually code [28]. MDA considers a set of MOF-compliant trans-formation languages such as the Model to Text Transtrans-formation (MOFM2T) language to enable

(39)

2.5 Model-driven Software Engineering 17

to-code transformations and the Query/Views/Transformation (QVT) standard for model-to-model transformations [39]. The QVT standard specifies three languages: the declarative QVT Relation (QVTr) to design relations between models, the imperative QVT Operation (QVTo) to write unidirectional transformations and the lower-level declarative QVT Core (QVTc). QVTr is a higher-level language built on top QVTc [39].

OMG adopts a four-layered architecture based on model abstracting level ranging from layer M0 to layer M3 [42]. M0 corresponds to actual real-world objects, that is, instances of business objects; M1 is the model which abstracts the instances of M0; M2 to the meta-model to which M1 conforms and finally M3 as the meta-metamodel that defines the concepts in M2 [13].

As defined by Kleppe et al. [42], "A transformation definition is a set of transformation rules that together describe how a model in the source language can be transformed into a model in the target language. A transformation rule is a description of how one or more constructs in the source language can be transformed into one or more constructs in the target language."

A transformation rule consists of two parts: a left-hand side (LHS) which accesses the source model and a right-hand side (RHS) which expands in the target model. Both sides can be rep-resented using any combination of three approaches: (1) variables, sometimes referred to as metavariables, hold elements from the source and/or target models; (2) patterns, model fragments with zero or more variables and (3) logic, computations and constraints on model elements.

At the top level, two types of transformations can be identified: model-to-model (M2M) and model-to-code(M2C), more accurately model-to-text since non-code artifacts may be generated, although both terms are often used interchangeably [22]. Model-to-code transformations, in par-ticular, can be seen as a special case of a model-to-model transformation for which a metamodel for the target programming language is provided and are in general very similar to what compil-ers perform when translating a high-level programming language into a lower-level equivalent. Although both types are identified, in actuality, based on the definition of a model, M2C is, in fact, a subset of M2M transformations, as code can be seen as a model itself, although at a lower abstraction level closer to machine operations [47].

Furthermore, Cabot et al. [18] identified a set of properties to determine whether a transfor-mation behaves as a mathematical function. In particular, if a transfortransfor-mation is executable, total, deterministic, functional, exhaustive, injective, and bijective. Furthermore, these properties, along with others, are also applicable to individual rules.

Czarnecki et al. [22] proposed a taxonomy for the classification of model transformation ap-proaches along with some applications exemplifying practical instances for each in the context of MDA. This taxonomy will be adopted during this work for the state-of-the-art review on these techniques and is briefly described below.

In regards to model-to-model, five approaches were identified: direct-manipulation approaches, relational approaches, graph-transformation-based approaches, structure-driven approaches, and hybrid approaches.

• Direct-manipulation. Is the most low-level approach. Offers an internal model representa-tion and some API to manipulate it and requires the users to implement transformarepresenta-tion rules

(40)

18 Background

and scheduling mostly from scratch using a programming language such as Java.

• Relational. Defines declarative constraints with executable semantics between the source and target models, similar to logic-programming.

• Graph-transformation-based. Operates on graphs specifically design to represent UML-like models. Rules are defined using graph patterns for both the LHS and RHS.

• Structure-driven. Split into two phases: (1) creating a hierarchical structure of the target model and (2) setting attributes and references in the target. Scheduling and application strategy are defined by the framework and rules cannot have side-effects.

• Hybrid. Combines techniques from previous approaches.

In summary and a less granular view, we find three main approaches: declarative and im-perative, which function similarly to how traditional programming languages classified in these paradigms behave, and graph-based which operate on graph-based models through graph alge-bra [39]. Graph-based approaches are in general more complex but provide more flexibility, useful in solving some complex problems such as bidirectional transformations.

In regards to model-to-code transformations, two main approaches were identified: visitor-based and template-visitor-based.

• Visitor-based. Similar to direct-manipulation approaches in the sense that a simple visitor mechanism is provided to access and traverse the internal representation of the model to generate text for the source model components.

• Template-based. Templates are used, usually consisting of the target text containing splices of metacode to access information from the source and to perform code selection and iter-ative expansion. The LHS logic can either be direct manipulation of an API or declariter-ative queries. Usually offers user-defined scheduling. Because textual templates are independent of the target language, the generation of any textual artifacts is simplified. Additionally, templates promote re-usage and are in general more intuitive since their definition closely matches the structure of the resulting artifact.

The majority of the available MDA tools support template-based code generation [22]. Code generation is usually unidirectional, meaning that, no way to reverse-engineer the code and syn-chronize with the model is provided. As a result, tools frequently warn the developer that code is overridden on generation [33]. Moreover, some provide custom code protection measures such as the separation between implementation edited by the developer and interfaces exclusively gener-ated by the tool [39]. In comparison, visitor-based approaches are more basic and highly rely on the programmer to formulate the complete logic behind code generation.

(41)

Chapter 3

State of the Art

3.1 Model-driven Software Engineering . . . 19 3.2 Visual Approaches in Operations . . . 24

This chapter is focused on reviewing and describing the existing state of the art relevant to the context of the problem. The goal is to grasp the current landscape to not only understand what so-lutions and approaches currently exist in the domain but also to extract knowledge and inspiration for the proposed solution.

More precisely, this review is focused on achieving two primary goals: examining some trans-formations techniques utilized in MDSE as well as their practical applications (Section3.1) and surveying existing visual solutions mainly within the Information Technology (IT) operations field and in a broader sense DevOps, with a heavy focus on those concerned with services management, particularly those which rely on or are built for Docker technologies (Section3.2). Both objectives contribute towards a clearer vision and understanding of what the current landscape is for the two core areas from which knowledge will be leveraged for the solution. A summary is included at the end of each section, presenting an overview, discussion, and main conclusions regarding the previous analysis.

Both peer-reviewed publications and gray literature were considered for review.

3.1 Model-driven Software Engineering

This review covers the state of the art in regards to MDSE transformation techniques and how these techniques are employed in practical cases. The former is useful to support informed imple-mentation decisions by leveraging the accumulated experience in the field while the latter offers insight into what the latest relevant practical contributions in the software industry have been as well as what are the current concerns among the academic community.

(42)

20 State of the Art

3.1.1 MDSE Transformations

In this section, we explore some practical examples of transformations techniques utilized in MDSE. The purpose of this review is to understand what the most appropriate approach or combi-nation of techniques may be for the bidirectional transformation between the internal source-code model of a Docker Compose configuration and its YAML equivalent. Therefore, this review tries to answer the following question:

What approaches exist that can support a round-trip model to code transformation?

This issue is related to the implementation of the import/export feature as well as the syncing mechanism between the two representations expected in the proposed solution. With this purpose in mind, we will primarily focus on M2C transformation approaches, since this is where the central issue lies.

We begin by reviewing practical tools for M2C approaches as identified by Czarnecki and Helsen [22] to more clearly illustrate how tools cope with these issues. Unfortunately, many of the examples have been discontinued and its documentation is no longer publicly available. Therefore, at least for those, the only reference material left is what is presented in the original work and other contemporary papers since no method is available to empirically test the tools. However, in 2015, Kahani and Cordy [39] performed a comprehensive survey on MDE enabling tools outlining the major differences between them.

Visitor-based. An example is Jamda1. This framework provides a set of classes to represent UML models, an API for manipulating models, and a visitor mechanism to generate code. Ad-ditional model element types can be introduced by subclassing existing Java classes [22]. In a regular workflow, users use UML models usually in the XMI format most frequently manipulated with some UML modeling tool such as MagicDraw2or Modelio3. Currently, the only supported language for the generated output is Java.

Template-based. Many tools can be found which adopt this approach. One example is An-droMDA4which uses this approach for code generation. It supports both .NET and Java as output languages.

We will now focus on reviewing transformation approaches to enable round-trip synchroniza-tion through, among other tactics, bidirecsynchroniza-tional transformasynchroniza-tions (BX). The study of this subject expands beyond the scope of MDSE since these mechanisms are useful in solving problems found in numerous other areas including databases and programming languages [21].

Initial efforts include work by Stevens [61]. The author discussed how bidirectional transfor-mations could be achieved using the QVT standard, more specifically QVT Relational, including the proposition of a framework for "coherent transformation". In the study, the author considers transformations that are bidirectional but not necessarily bijective. Transformations are specified

1_{Jamda, available at}_{http://jamda.sourceforge.net/}

2_{MagicDraw, available at}_{https://www.nomagic.com/products/magicdraw} 3_{Modelio, available at}_{https://www.modelio.org/}

(43)

3.1 Model-driven Software Engineering 21

in an appropriate language (e.g. QVTr) and can be interpreted both as the relation between two models and as forwards or backwards transformations. A set of assertions are considered: (1) transformations should be deterministic, (2) a transformation may depend on the current value of the target and source models which will be replaced, reinforcing the notion that transforma-tions may not be bijective and (3) transformatransforma-tions have to be total. This resulted in the following definition:

"Let R be a transformation between metamodels M and N,consisting of a relation R ⊆ M × N and transformation functions−→R : M × N −→ N and ←R−: M × N −→ M. Then R is a coherent transformation if it is correct, hippocratic and undoable."

In 2008, Angyal et al. [5] proposed a mechanism to maintain round-trip synchronization for M2M and M2C transformations. Our interest lies primarily in M2C, that is, in the context of the proposed solution, the transformation between the PSM, stored in memory as an Abstract Syntax Tree (AST)), to code (e.g. some docker-compose.yml, which also has to be periodically parsed and converted to an AST). Two ASTs have to remain in memory at all times, the internal PSM (M) and the last synchronized code state (C0). The synchronization process, for a given synchronization step, is achieved as follows:

1. The difference between C1and the current code AST (C0), generated from the current state of the code file, is computed (ε1).

2. The difference between C0and M is also computed (D2).

3. ε1is atomically propagated to M and ε2 is atomically propagated to C1, achieving consis-tency.

4. C0is updated to store C1.

In 2010. Hidaka et al. [30] were among the first to propose a solution for bidirectional transfor-mation applied to graphs based on UnCAL, a graph algebra. The solution explores trace informa-tion to achieve well-behaved transformainforma-tions (i.e. consistent transformainforma-tions in both direcinforma-tions). A year later, the authors expanded upon their work by developing the framework GRoundTram, leveraging the achieved solution [29].

Later, in 2015, Hoils et al. [33] explored the concept of higher-order transformations (HOTs) to address the BX problem. The result was a framework based on bidirectional higher-order trans-formations (B-HOTs). The framework is independent of the transformation language used due to the usage of binding specifications.

3.1.2 Model-driven Cloud Infrastructure

Cloud computing has been the target of numerous model-driven approaches following its mas-sive proliferation and evolution which promoted the need for new and better solutions. Such

(44)

22 State of the Art

approaches have surfaced in an effort to improve the way resources are managed in cloud environ-ments by working at a higher abstraction level provided by high-level models as an alternative or complement to conventional low-level configurations.

In 2012, Ardagna et al. [6] proposed the initial vision of MODAClouds, a framework to tackle the challenges inherent to the lack of interoperability between cloud service providers and result-ing lock-in to some provider. The goal was achievresult-ing a service-agnostic approach fit for runnresult-ing applications in cross-provider multi-clouds settings, appropriate for strict high availability and flexibility non-functional and business requirements. To this end, a model-driven approach was proposed, allowing the design of software system model artifacts at distinct abstraction levels which are ultimately transformed into code and automatically deployed in the target cloud plat-forms. Furthermore, it leveraged common cloud design patterns and best practices to aid the users in supporting decision making during the modeling process by offering mechanisms to measure and check if the implementation satisfies the requirements and optimizes target cloud environment selection based on the characteristics of the application.

In 2014, Ferry et al. [27] tackled the challenges of interoperability in multi-cloud settings as part of the same programme as MODAClouds. The result was the Cloud Modelling Framework (CloudMF), a model-driven approach for managing applications in multi-cloud environments tai-lored with DevOps principles in mind. The approach was built on two components: the DSML CloudMlto model the applications and the run-time environment models@run-time to manage the systems.

In the end, this effort became part of the MODAClouds initiative along with two other sibling projects PaaSage and ARTIST. PaaSage adopted the Cloud Application Modelling and Execution Language (CAMEL) which integrates and extends existing Domain Specific Languages (DSLs) including CloudML [55].

Also in 2014, Bergmayr et al. [9] explored a similar idea, although less ambitious in scope, when proposing the cloud-specific extension to UML’s deployment language named Cloud Appli-cation Modeling Language (CAML) tailored towards representing concepts in this domain devel-oped in the context of the previously mentioned ARTIST project. The resulting extension follows MDA’s principles by separating between cloud provider-independent and cloud provider-specific deployment models, matching the PIM and PSM respectively. Working prototypes were devel-oped for Eclipse5and Enterprise Architect6. The authors also argue in favor of the usefulness of CAML for blueprints definition, taking advantage of UML’s template reusability. When compared to CloudML, CAML does not include a run-time component to effectively deploy the modeled ap-plications.

Regarding more recent advancements, Sandobalin et al. [56] proposed ARGON (An infRas-tructure modellinG tool for clOud provisioNing) in 2017, a DevOps support tool leveraging IaC practices and its benefits for infrastructure provisioning while simultaneously minimizing its draw-backs through a model-driven approach. The project defines a DSL to model infrastructure

inde-5_{More information available at}_{https://www.eclipse.org/} 6_{More information available at}_{https://sparxsystems.com/}

Visual Programming Language for Orchestration with Docker

F

E

U

P

Visual Programming Language for

Orchestration with Docker

Bruno Piedade

Visual Programming Language for Orchestration with

Docker

Bruno Piedade

Mestrado Integrado em Engenharia Informática e Computação

Abstract

Resumo

Acknowledgements

Contents

List of Figures

List of Tables

Abbreviations

Chapter 1

Introduction

1.1

Context

1.2

Problem Definition

1.3

Motivation

1.4

Main Goals

1.5

Contributions

1.6

Dissertation Structure

Chapter 2

Background

2.1

Cloud Computing and Infrastructure

2.2

Virtualization and Containerization

2.3

Software and Program Visualization

2.4

Visual Programming Languages

2.5

Model-driven Software Engineering

Chapter 3

State of the Art

3.1

Model-driven Software Engineering