Platform to support the development of IoT solutions

(1)

Universidade de Aveiro Departamento de Eletrónica,Telecomunicações e Informática

2019

SÉRGIO FILIPE

MARQUES MANSO

Plataforma de apoio ao desenvolvimento de

soluções IoT

(2)

(3)

“What is dangerous is not to evolve”

2019

SÉRGIO FILIPE

MARQUES MANSO

Plataforma de apoio ao desenvolvimento de

soluções IoT

Platform to support the development of IoT

solutions

(4)

(5)

2019

SÉRGIO FILIPE

MARQUES MANSO

Plataforma de apoio ao desenvolvimento de

soluções IoT

Platform to support the development of IoT

solutions

Dissertação apresentada à Universidade de Aveiro para cumprimento dos requisitos necessários à obtenção do grau de Mestre em Engenharia de Computadores e Telemática, realizada sob a orientação científica do Doutor Diogo Gomes, Professor auxiliar do Departamento de Eletrónica, Telecomu-nicações e Informática da Universidade de Aveiro, e do Doutor João Paulo

(6)

(7)

o júri / the jury

presidente / president Professor Doutor Paulo Monteiro

Professor associado da Universidade de Aveiro

vogais / examiners committee Doutor Alfredo Matos

(8)

(9)

agradecimentos /

acknowledgements Quero agrader aos meus pais e ao meu irmão por toda a força e confiançaque depositaram em mim ao longo destes anos, sem eles nada disto tinha sido possível. Agradeço também à minha namorada pelo incansável apoio, por toda a paciência que teve nas fases mais atribuladas do meu percurso e por estar sempre ao meu lado. Ao meu amigo José P.M. Cerca que, por mais distante que esteja e por mais divergências clubisticas que tenhamos, se mostrou sempre disposto a dar uma palavra "especial" de apoio. Aos meus amigos Sérgio Cascão, Tiago Marques e Pedro Rocha por toda a motivação ao longo do curso e por todas as aventuras que trouxeram. Ao meu primo Tiago pelas palavras de motivação e confiança depositada. Aos meus amigos do C.D.B. por estarem sempre presentes em todos os momentos.

Agradecer aos meus amigos Diogo Sousa, Dzianis Bartashevich e Ana Rita Santiago por se terem mostrado pessoas excepcionais e sempre dispostas a ajudar.

Agradeço também ao Professor Diogo Gomes e ao Professor João Paulo Bar-raca por toda a disponibilidade e ajuda prestada no desenvolvimento desta dissertação.

(10)

(11)

Keywords DevOps, IoT, Cloud Computing, Containerização, Orquestração, Gestão de Configuração de Software

Resumo A Internet das Coisas é um paradigma em ascensão que despertou o inter-esse das empresas na optimização das suas plataformas de IoT. A afirmação do Cloud Computing, aliado à evolução das soluções de Conteinerização e Orquestração e à complementação das ferramentas de Gestão de Config-uração de Software, permitiu melhorar a forma como as soluções de soft-ware são projetadas e implementadas. Esta dissertação propõe uma possível solução baseada na implementação destas tecnologias, que auxilia o desen-volvimento de novas soluções de software, bem como a sua implementação. Recorrendo a este conjunto de tecnologias, é possível introduzir novas fun-cionalidades que não existiam em implementações mais antigas. A solução proposta combina serviços como a Orquestração Cloud com as funcionali-dades de tecnologias como a Orquestração de containers e ferramentas de Gestão de Configuração de Software para construir de raiz, e de forma au-tomática, uma plataforma capaz de oferecer um ambiente robusto, escalável

(12)

(13)

Keywords DevOps, IoT, Cloud Computing, Containerization, Orchestration, Software Configuration Management

Abstract The Internet of Things is a fastly growing paradigm that has raised the interest of companies in optimizing their IoT platforms. The establishment of Cloud Computing, together with the evolution of Containerization and Orchestration solutions and the complementarity of the Software Configuration Manage-ment, has resulted in an improved design and in an improvement of the way that software solutions are shipped and deployed. Based upon the integra-tion of these technologies, this dissertaintegra-tion proposes a potential soluintegra-tion that helps with the development of new software solutions and their deployment. With this set of technologies, it is possible to introduce new features that were non-existent in traditional deployments. The potential solution brings together services such as Cloud Orchestration and features offered by Container Or-chestration and Software Configuration Management for building from scratch,

(14)

(15)

List of Figures

2.1 The three main cloud computing service models . . . 7

2.2 Relation between cloud services and cloud deployment models . . . 8

2.3 Nova service architecture . . . 13

2.4 Neutron service architecture . . . 14

2.5 Graphical representation of Keystone concepts . . . 16

2.6 General overview of an OpenStack service architecture [27] . . . 17

2.7 OpenStack Heat architecture and workflow . . . 19

2.8 Full virtualization (Hypervisor) vs Containerization . . . 21

2.9 Docker Architecture . . . 22

2.10 Docker execution environment . . . 23

2.11 Docker container architecture . . . 24

2.12 Docker Swarm cluster . . . 25

2.13 Docker Swarm overview [39] . . . 27

2.14 Docker Swarm services and tasks relation diagram [40] . . . 28

2.15 Docker Swarm service scheduling [40] . . . 28

2.16 Docker Swarm load balancing [41] . . . 29

2.17 Docker Swarm encrypted communications [42] . . . 30

2.18 Kubernetes cluster architecture overview [54] . . . 32

2.19 Workflow of Chef . . . 34

2.20 Workflow of Ansible playbook deployment . . . 35

3.1 Architecture of the application proposed . . . 38

3.2 Common infrastructure of a platform using virtual machines . . . 39

3.3 Abstract overview of the final stage of proposed solution . . . 41

(18)

3.5 Setup of the Configuration Agent . . . 44

3.6 "Raw" Stack configured to serve an orchestration cluster . . . 44

3.7 Example of service scheduling in an orchestration cluster . . . 46

3.8 Application deployed in an Orchestration cluster . . . 46

4.1 Platform deployment sequence diagram . . . 50

4.2 Resources created via Heat showing in Horizon . . . 52

4.3 Example of local Docker Registry usage (CLI) . . . 54

4.4 Listing available nodes from the Docker Swarm cluster from CLI . . . 55

4.5 Eclipse Hono internal architecture . . . 57

4.6 Generic overview of the IoT Platform deployed . . . 58

4.7 Example of the distribution of the containers in the cluster (Portainer) . . . 59

4.8 Final stage of the implemented solution . . . 60

4.9 Example of the Swarm services listing using Portainer . . . 61

5.1 Illustration of the deployment scenario . . . 63

5.2 Impact of the cluster’ size in deployment time using Swarm . . . 65

5.3 Impact of the cluster’ size in deployment time using Kubernetes . . . 66

5.4 Comparison of the platform’s deployment time using Swarm and Kubernetes . . 67

5.5 Impact of the cluster’ size in the deployment time using Swarm . . . 68

5.6 Application deployment time in the platform . . . 68

5.7 Impact of the number of replicas in the application deployment time and cluster size of 3,5 . . . 70

5.8 Impact of the number of replicas on the application deployment time and cluster size of 5,7 . . . 71

(19)

List of Tables

2.1 Comparison between the AWS S3 storage classes as of September 2019 [9] . . . . 11

2.2 Comparison between features provided by Heat, CloudFormation and Terraform 20

2.3 Comparison between Ansible and Chef Infra features . . . 36

5.1 Variation of flavors for testing the platform . . . 67

5.2 Results of the impact of the number of replicas in a cluster size of 3 Managers

and 5 Workers . . . 69

5.3 Results of the impact of the number of replicas in a cluster size of 5 Managers

and 7 Workers . . . 70 5.4 Results of the rescheduling time for a failing service . . . 72 5.5 Node assignment for failing services . . . 72

(20)

(21)

Acronyms

ACL Access Control List

AKS Azure Kubernetes Service

AMI Amazon Machine Images

AMQP Advanced Message Queuing Protocol

API Application Programming Interface

AWS Amazon Web Services

CA Certificate Authority

CD Continuous Delivery

CIFS Common Internet File System

CI Continuous Integration

CLI Command-line Interface

CN Common Name

CPU Central Processing Unit

CRI Container Runtime Interface

DNS Domain Name System

EBS Elastic Block Storage

EC2 Elastic Compute Cloud

EKS Elastic Kubernetes Service

FWaaS Firewall as a Service

GCE Google Compute Engine

GKE Google Kubernetes Engine

GPU Graphics Processing Unit

GUI Graphical User Interface

HCL HashiCorp Configuration Language

HDD Hard Disk Drive

HOT Heat Orchestration Template

HTTP Hypertext Transfer Protocol

IaaS Infrastructure as a Service

IAM Identity and Access Management

IBM International Business Machines Corporation

INI Initialization

IOPS Input/Output Operations Per Second

IoT Internet of Things

IP Internet Protocol

iSCSI Internet Small Computer System Interface

IT Information Technology

JSON JavaScript Object Notation

k8s Kubernetes

KVM Kernel-based Virtual Machine

LVM Logical Volume Manager

LXC Linux Containers

MB/s Megabytes per second

MQTT Message Queuing Telemetry Transport

NAT Network Address Translation

NFS Network File System

NIST National Institute of Standards and Technology

OKD OpenShift Origin

OS Operating System

OU Organizational Unit

PaaS Platform as a Service

PKI Public Key Infrastructure

RAM Random Access Memory

REST Representational State Transfer

(22)

RPC Remote Procedure Call

S3 Simple Storage Service

SaaS Software as a Service

SCM Software Configuration Management

SLA Service Level Agreement

SOAP Simple Object Access Protocol

SOA Service-Oriented Architecture

SSD Solid State Drive

SSH Secure Shell

TLS Transport Layer Security

vCPU Virtual Central Process Unit

VM Virtual MAchine

VPNaaS Virtual Private Network as a Service

WinRM Windows Remote Management

XML Extensible Markup Language

XMPP Extensible Messaging and Presence Protocol

YAML YAML Ain’t Markup Language

(23)

CHAPTER

1

Introduction

The development of new software development methodologies has recently obtained a central role in the lifecycle of newly developed software solutions, with clear gains to the improvement and automation of tasks, DevOps is one such example. It brings software development and operations management together, aiming to increase software lifecycles. The concept behind DevOps is to simplify and to improve a significant number of tasks by automating the processes associated with them. This allows, for instance, to reduce the time spent on tasks such as the deployment of software solutions in production environments. DevOps also allows for creating mechanisms capable of automating a vast number of operations, ranging from the continuous deployment of software to the management of an IT cloud infrastructure.

The Internet of Things world is built under a paradigm of integrating the Internet in a variety of objects present in our everyday life. The Internet of Things world has fastly grown, which translates in a considerable number of devices generating data at a very high pace. Such complexity demands high-performance platforms capable of handling a significant number of connections and storing all generated data.

The convergence between DevOps and Cloud computing brings new possibilities in building and serving efficient platforms capable of meeting the IoT demanads. It is essential for these platforms to have robust underlying infrastructures and runtime environments so that they can perform the most complex and demanding tasks. Cloud computing brings more advantages than the traditional infrastructure with one or more servers where each server handles a given number of components of a platform. Another employed approach consists in provisioning virtual machines that run the components individually or in groups of components. Traditional approaches typically lead to points of failure without recovery backups, high times of debug and maintenance and also do not provide redundancy or scalability.

(24)

1.1 Motivation

Using newly available software solutions such as Cloud Computing, Containerization, Container Orchestration and Software Configuration Management tools eliminate most of the limitations associated with traditional deployments, and improves IoT applica-tionss. Cloud computing delivers a cheaper and easy way to setup, maintain and scale infrastructures. Allied to this, Containerization and Orchestration solutions enhance these platforms by adding additional features provided from containerization and orches-tration, such as isolation of applications and fault tolerance. Configuration-management software further adds the possibility to replicate platforms consistently and makes it easier to deploy and manage every component.

Integrating the aforementioned solutions together potentially allows deploying IoT platforms quickly and with new and improved features. The main goal of this dissertation is to establish a platform capable of supporting an IoT solution while delivering an excellent performance at the same time. The second goal is to develop a mechanism that offers the possibility to customize resources used by the platform, such as for instance, the number of virtual machines and the number of CPU’s and RAM for each one of them. Importantly, this mechanism should handle each step of the platform’s deployment automatically and offer the possibility to replicate the same infrastructure easily.

1.2 Objectives

This dissertation aims to provide a software solution capable of creating a platform from scratch, which supports an IoT application. This software solution should be able to implement its underlying infrastructure, supported by a cloud computing service and to establish an orchestration cluster from these infrastructure resources. It should also be capable of deploying the IoT application automatically as part of the deployed platform. This last step is optional, but aims to provide a clean orchestration environment for supporting the development and test of new software solutions. This dissertation provides an in-depth analysis of the relevant solutions available in these areas and an overview of the decisions made during the implementation process.

1.3 Document structure

This dissertation comprises 6 chapters. The first chapter is the one presented so far, and the remaining chapters are as follows:

• Chapter 2 - State of the art in cloud computing including the services available in the cloud computing area, the available containerization and orchestration 2

(25)

solutions, and the most popular software configuration management tools used alongside with the previous solutions;

• Chapter 3 - Overview of the implemented architecture, describing every layer constituting the platform;

• Chapter 4 - Detailed explanation about the services and technologies selected to support the implementation of this software solution;

• Chapter 5 - Description of test scenarios, the test cases used and an analysis of the obtained results;

(26)

(27)

CHAPTER

2

State of the Art

Due to the constant evolution of technology, companies want to keep up with these advances aiming to provide the best performance possible to their software solutions. The establishment of Cloud computing, the evolution and optimization of containerization solutions alongside containerization orchestration came to offer a new landscape over the already deployed software solutions, leading companies to reflect about the way their software solutions are designed and deployed. Most of the deployed software solutions have already been, or are currently planned to be migrated to more robust and more performing platforms. By integrating these solutions together and projecting new software solutions based on them, it is possible to offer new features and eliminate limitations found in older deployments. Software Configuration Management solutions are commonly linked to the technologies above, typically having an essential role in their integration.

This chapter presents related work in the area of this dissertation, namely Cloud computing models, Cloud computing platforms, containerization and orchestration solutions and Software Configuration Management solutions.

2.1 Cloud Computing

Cloud Computing has become increasingly important in the IT industry over the last few years. The National Institute of Standards and Technology (NIST) has defined cloud computing as "a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction" [1] This model provides computing power, storage solutions, applications, among other resources through the Internet with pay-per-use pricing strategy, both to enterprises and regular consumers.

(28)

Cloud services are deployed in three main service models, which are known as Intrastructure as a Service IaaS, Platform as a Service PaaS, and Software as a Service SaaS.

2.1.1 Service models

The Infrastructure as a Service (IaaS) model delivers computing, networking, storage, and other resources to users. This model enables to provision virtual resources as required to develop, run and deploy applications or data, therefore creating an abstraction of the underlying infrastructure. To perform these operations, the cloud provider discloses tools such as orchestration, monitoring, backup, security, and log access. Some of the most well-known players that offer this model are Amazon (Amazon Web Services), Microsoft (Microsoft Azure), IBM (IBM Cloud) and Google (Google Cloud Platform). Another used model is the Platform as a Service (PaaS). This model provides a cloud environment that enables users to develop, test and host software solutions while eliminating the complexity of configuring the underlying application infrastructure such as operating system, middleware, and development tools. Cloud providers typically offer more than one cloud computing model. This is the case of Amazon (AWS), Microsoft (Azure), and Google (Google Cloud Services) which offer both IaaS and PaaS. Oracle (Oracle Cloud Platform) and Salesforce (Salesforce Platform) are also big providers of

PaaS in the industry.

The final model for cloud computing is known as Software as a Service (SaaS). This enables users to access cloud-based software through a web browser or an API , and are not device dependent. SaaS is currently the most common way of interacting with the cloud for the regular user, since that a significant part of the software is provided in this model. As an example, Google provides a set of tools named G-Suite (Gmail, Google Drive, Google Docs), and Microsoft provides Office 365. [2]

Figure 2.1 relates the typical user to each model, the layers of software and hardware covered, and what services provide each model.

The aforementioned examples for each model are presented in Figure 2.1, relating them with the typical user. It also presents the abstraction levels for each model (e.g., SaaS offers applications while abstracting the underlying layers such as OS and Virtualization layers.) so as examples of the services available in the industry that provide them.

(29)

Figure 2.1: The three main cloud computing service models

2.1.2 Deployment models

It is also possible to classify cloud environments based on how they operate. This categorization results from the trade-off between outsourcing third-party services (i.e. IaaS such as AWS or Azure), or investing of funds, human resources and time to set up and manage a private infrastructure. In the first case, the organization outsources services to a cloud provider that serves an infrastructure over the internet (i.e. public cloud model) using a pay-per-user pricing model. The provider owns all of the hardware and software resources, and is responsible for the management, maintenance and security of the whole infrastructure. This means that the organization does not have to assign or hire new system administrators. In a public cloud model, several organizations share the same underlying infrastructure (hardware, software, networking devices). Importantly, there is a degree of isolation and abstraction between the organizations, i.e., it is implemented a multi-tenancy architecture associating a cloud tenant to each organization. In the second case, the organization sets up a private cloud that is only available internally, or contracts a third-party provider that facilitates a private infrastructure. All the operability costs and management are the entire responsibility of the organization. This type of cloud facilitates scaling resources, and allows meeting individual requirements since the hardware is dedicated to the organization itself.

The public cloud might not meet all the organization requirements since migrating all internal services and applications to the cloud can generate much entropy and demand specific IT requirements. In some cases, organizations choose to combine private and public clouds, leading to another cloud model, the hybrid cloud. The hybrid cloud is a combination of the two previously described models, and seeks to take the best of both private and public clouds. A model like this can be applied, for example, to

(30)

an organization that already has a private storage solution to retain data (e.g., data collected from IoT devices) and need the computing power for data processing, which can be outsourced from a public cloud provider. In addition, it offers higher scalability and availability, and does not add maintenance and security concerns. A transition phase from a private cloud to a public cloud, or vice versa, results in a hybrid cloud. In such case, the organization can have services hosted in its private cloud, while having parts in a public cloud. [3] [2]

Figure 2.2: Relation between cloud services and cloud deployment models

Figure 2.2 relates cloud service and cloud deployment models taking into consid-eration the level of abstraction, the level of data control and flexibility. It’s possible to notice that a SaaS have a higher level of abstraction (i.e. user is not aware of the underlying structure) and when hosted in a public cloud, the control over stored data is lesser than when hosted in a private cloud. On the other hand, an IaaS hosted in a private cloud brings more flexibility than one in a public cloud.

Cloud computing brings many benefits to companies. One of the most critical and most considered advantages of cloud computing is its affordable pricing conditions. An organization that is taking its first steps might not be able to increase its capital expenditure (i.e. invest in new hardware and infrastructure). Because cloud providers practice a competitive pay-per-use model, organizations can invest in a service that lowers the investment in new equipment (i.e. servers, air conditioning, power distribution, power redundancy, power backup). Another advantage is that the organization might not have much physical space to allocate for this new equipment, hence being wiser to contract a cloud service. These services are usually announced with a high percentage of availability. Providers follow a Service Level Agreement (SLA) that offers 24 hours a 8

(31)

day, 7 days a week and 365 days a year of service and an availability higher than 99% so that organizations often prefer these services . Other than this, providers offer a high level of reliability which are based in failover mechanisms such as, for example, when a server fails, everything hosted in it can be reallocated to any other available server. Near-unlimited resources are also available and they can be beneficial in situations of unexpected traffic. In the case of under provisioning, cloud computing gives the company the ability to scale resources to meet with the new needs. Mobility is another advantage taken from cloud computing. Companies often use services hosted in the cloud on a daily basis. This allows employees to access them wherever they are. One well-known example of it is the e-mail. Lastly, increased data security is another factor to take into consideration, since cloud providers offer data replication with backup solutions. [2] [3] [4]

Despite these advantages, there are disadvantages of using cloud services. Firstly, when introducing or migrating to the cloud, companies lose some of the control. An example of this occurs when data is stored in a third party provider which is responsible for the data’s security. The co-existence of different tenants using the same instance of service makes these infrastructure attractive to hackers and attacks In addition, issues such as bandwidth and latency have also been shown to affect the availability or the user experience. [3] [4] [5]

2.2 Cloud Solutions

Recent data suggests that private clouds have been preferred by companies over public clouds, owning to privacy concerns of public clouds Other than this reason, recent developments in cloud technology have led to cheaper and easier solutions for companies to set and maintain private clouds. In order to build a private cloud, companies have to assemble the underlying infrastructure and have to own the tools to deploy, provide, and monitor this environment. Development of cloud management platforms have provided the tools needed for these tasks, enabling infrastructure administrators to monitor and control the elements that comprise a cloud system, such as computing, networking, storage, and power resources, and thus allowing companies to set and maintain their clouds independently. In brief, these platforms aim to offer functionalities such as high availability, multi-tenancy, resource allocation, monitoring, migration, and dynamic scalability. Examples of open source solutions include OpenStack, OpenNebula, Eucalyptus and CloudStack.

(32)

2.2.1 Amazon Web Services

Amazon Web Services (AWS) is a public cloud service provided by Amazon. This service was established in 2002 when the first closed beta version of AWS was released. It was originally named as Amazon.com Web Service, and by then it was a very limited service since it only offered a SOAP interface to the Amazon product catalog. [6] By starting the cloud computing with the first release of AWS’ service Simple Storage Service (S3), Amazon became the first and foremost player in this area.

AWS is a closed source platform that provides cloud services (i.e. IaaS, PaaS and SaaS) to individuals and companies using a pay-per-use business model. Some of the services available include the Simple Storage Service (S3), the Elastic Compute Cloud (EC2), the Elastic Block Storage (EBS), and the Cloud Formation.

AWS offers different locations around the globe for its services, which allow deploying the desired resources in the most convenient locations. For instance, if a company has its headquarters in Germany, it is more reasonable to deploy services established as closely as possible (e.g., Frankfurt), so that the connection can have the lowest latency as possible.

Simple Storage Service

AWS offers a scalable object storage solution called Simple Storage Service (S3), which provides the required tools and resources to manage data, such as API’s, GUI, and ’buckets’. ’Buckets’ is an AWS resource formed by data together with its respective metadata and permissions. They can be created in a specific location within a region. For instance, Europe has different locations where a bucket can be created, such as London, Ireland or Frankfurt.[7] Versioning can also be enabled for ’buckets’ so that when an operation is conducted, the previous state of the changed data is preserved allowing to rollback and recover. For example, the insertion of data would protect from accidental deletion. Furthermore, security, ACL’s, access logs, and object-level logging can be enabled.

Operations on buckets are available via API (REST and SOAP) or by using AWS GUI which allow storing and retrieving data, exchanging data between ’buckets’ or to make ’buckets’ publicly available.

When taking into consideration its final purpose, Amazon S3 offers different classes of storage. S3 Standard is a storage class which covers generic and frequently accessed data type of usage. For data with random access patterns, AWS provides S3 Intelligent-Tiering, S3 Standard-Infrequent Access (S3 Standard-IA), and S3 One Zone-Infrequent Access (S3 One Zone-IA) are designed to store data that is less frequently accessed. Finally, S3 Glacier and S3 Glacier Deep Archive are reliable and secure low-cost storage classes, aimed for data archiving. [8]

(33)

Standard Intelligent-Tiering Standard-IA One Zone-IA Glacier Glacier Deep Archive Durability 11 9’s 11 9’s 11 9’s 11 9’s 11 9’s 11 9’s Availability 99.99% 99.9% 99.9% 99.5% 99.99% 99.99% Availability SLA 99.9% 99% 99% 99% 99.9% 99.9% Availability Zones > 3 > 3 > 3 1 > 3 > 3 Minimum capacity charge per object N/A N/A 128KB 128KB 40KB 40KB Minimum storage duration charge

N/A 30 days 30 days 30 days 90 days 180 days

Retrieval

fee N/A N/A

per GB retrieved per GB retrieved per GB retrieved per GB retrieved First byte latency ms ms ms ms minutes or hours hours Table 2.1: Comparison between the AWS S3 storage classes as of September 2019 [9]

Table 2.1 compares different S3 storage classes, focusing on comparing the service provided and its performance. As aforementioned, S3 Standard is a solution for storing common data and for situations of regular accesses. When using S3 Standard and the minimum usage of resources is not met (capacity and storage duration), AWS does not charge users when comparing to other classes. For instance, S3 Intelligent-Tiering has a minimum storage duration of 30 days. If the the duration of stored data is less than the minimum stipulated, charges still apply. Additionally, S3 classes are designed for archiving and for having a bigger latency than the rest of the classes. For example, each time data is requested from a Glacier bucket, the access to the information takes minutes or hours. In S3 Standard it takes only milliseconds to do the same operation. Another complementary approach to evaluate these services is durability, which traduces to the number of objects that could be lost every year. For instance, a service with the durability of 99.999999999% (or 11 9’s) means that it is possible to expect a loss of 0.000000001% of objects in one year. In other words means that if 10000000 objects are stored there is a possibility to lose one object every 10000 years. [8]

(34)

Elastic Compute Cloud

Elastic Compute Cloud (EC2) is a service from AWS that offers computing power. In other words, it allows users to deploy virtual machines (instances) which offer a wide range of specifications. When a EC2 instance is created, the user specifies the requirements intended for it such as the number of CPU’s, memory, GPU’s, operating system, network, geographic locations, among others. [10] Operating systems for EC2 are pre-configured and templated in an Amazon Machine Image (AMI). However, AWS also provides tools so users can create their custom images. Well-known software includes WordPress, Drupal or Jenkins, which are provided in pre-configured images, ultimately saving the user from having to configure software from scratch.

Another handy feature of Amazon EC2 is the Auto Scaling feature. It allows setting scaling conditions when is verified an unexpected underperforming of an EC2 instance. This feature allows not only to scale up the number of instances to maintain performance, but also, to scale them down automatically after an operation is done. This prevents additional and unnecessary costs. When this service was first launched in 2006 [11], for beta testing,the only AWS storage service available for persistent storing was S3. Since this storing service does not allow to be used as a file system, data from EC2 instances needed to be synchronized to ’buckets’ via API. To this day, Amazon offers a service to persist data from instances called Elastic Block Store.

Elastic Block Storage

Amazon Elastic Block Storage (EBS) provides persistent block-level. EBS is designed to work with EC2. Data persistence of an EC2 instance can be achieved by creating an EBS volume and by attaching it to its corresponding instance. There are two categories regarding EBS volume types: SSD-backed and HDD-backed. SSD-backed volumes are designed for latency-sensitive transactional workloads. In order words, they are optimized for small read/write operations. On the other hand, HDD-backed volumes are optimized for large sequential workloads. AWS measures the performance of both types of EBS volumes in their dominant performance attribute. For SSD-backed volumes, measurements are done through Input/Output operations per second (IOPS), while for HDD-backed evaluation is achieved in terms of throughput which is measured in Megabytes per second (MB/s). [12] Like other AWS services, EBS also offers access to resource allocation in different regions around the globe . EBS provides important features such as creating snapshots of volumes to Amazon S3 (EBS Snapshots), encrypting volumes without needing to build a key management infrastructure from scratch (EBS encryption and AWS Identity and Access Management (IAM)), and dynamically adapting volumes’ size according to the requirements of each

application at a given time. 12

(35)

2.2.2 OpenStack

OpenStack is an open-source cloud computing software platform that was initially launched by NASA and Rackspace in 2010, combining the Nebula platform from NASA and the Cloud Files’ platform from Rackspace. At the time of writing this document, the nonprofit organization OpenStack Foundation is the entity responsible for managing the project. [13]

OpenStack implements a modular architecture and consists of a group of interrelated sub-projects developed independently. Each of these sub-projects aims to controls a specific pool of resources such as computing, networking or storage.

Nova or OpenStack compute is the service responsible for managing pools of comput-ing resources , providcomput-ing a way to provision compute instances. These modules enable the creation of virtual machines and bare-metal servers, and supports containerization within virtual machines or directly on bare metal . Nova supports many hypervisors including KVM, Xen, Hyper-V, and VMware, and supports virtualization at the op-erating system level such as LXC and Docker. Before launching a new instance, it is mandatory to specify the resources for it. These specifications are defined as flavors, which are managed by this service. Nova exposes all these features through a REST API but it can also be accessed through a set of tools such as command-line tools OpenStack Client, and Horizon, a graphical user interface that will be later presented. Figure 2.3 presents the main sub-components of Nova service: API, Scheduler, Conductor and Compute. Every component communicates with each other using a global message broker for OpenStack, each one having a distinct task. API component handles the Nova APIs calls, Scheduler determines in which host an instance should run, Compute is responsible for handling the life-cycle of instances and Conductor mediates the communication between Compute component and the OpenStack database.

Figure 2.3: Nova service architecture

(36)

regarding networking connectivity and the addressing of the virtual networking infras-tructure. It provides an API that allows users and administrators to create and manage networking components, such as networks, switches, subnets, and routers, for elements provisioned by the Nova service.[14] Neutron additionally provides advanced services including firewalls or FWaaS (which implement a notion of firewall policies and rules applied at the port level of routers) and virtual private networks or VPNaaS (which can be used to link two projects from two distinct OpenStack deployments). [15] [16]

Figure 2.4 presents the internal architecture of Neutron, indicating its services and the relations among them. All these services share the global Message broker to communicate to achieve the proposed tasks; Neutron Server exposes the networking API and handles the requests to the available Neutron plugins agents. DHCP Agent provides DHCP to the networks and L3 Agent provides NAT for the running instances allowing to be accessed from outside of OpenStack.

Figure 2.4: Neutron service architecture

OpenStack makes a distinction between two types of storage: ephemeral and persistent. When Nova is deployed without any storage service, there is no type of data persistence associated with the provisioned instances, meaning that after the termination of the instances, the contained data within is lost. On the other hand, persistent storage guarantees that all data is safeguarded regardless of the state of the instance. Object storage, block storage and file-based storage are three types of persistent storage that are implemented in three distinct services : Cinder, Swift and Manila. [17]

Cinder implements block storage service and provides block devices or Cinder volumes to instances, bare metal hosts and containers. Since persistence is guaranteed on these volumes, they can be detached and re-attached to different devices and the data remains untouched. It supports different and multiple back-ends in the form of 14

(37)

volume drivers, which means that one specific device can access the block storage using different drivers. LVM is the default back-end configuration, but it is possible to add new back-ends to cinder, such as for instance, iSCSI, NFS or Ceph. [18]

As for Object storage, this is implemented by Swift. This service provides an high-available, scalable and redundant blob storage over HTTP with data operations, where storing and retrieving of the data are done through a REST API. The actual data objects are stored with their metadata in groups of objects representing a container. Containers can be grouped in a collection of containers, representing an account. Each level of this hierarchy has different Access-control list or ACLs that define who has a particular type of access. All data is stored in partitions which can be distributed and replicated to different storage servers defined as Zones. Finally, rings are the entities that map all relations between structures and physical locations. An object storage is particularly beneficial in cases like storing a large dataset which can increase without bound. In this scenario, retrieving operations performs best for Object Storage since it takes advantage of the object’s metadata when searching for the desired information. [19] [20]

Shared File Systems service, or manila, offers a set of services for file-based storage management delivering storage to users in the form of shares. Just like cinder, manila supports multiple ends and can provide shared file systems from one or more back-ends. Some of the protocols supported include NFS, CIFS, and GlusterFS. File-based storage has similar characteristics and advantages to the block storage as both can be attached and detached to devices without data loss, both permit multi-attach the same volume/share to multiple devices. Their differences stem from user access and size limitation (i.e. quotas). These quotas can be specified by users to administrators in an initial request. Limitations such as rate limits, quotas, access rules, or security services can also be applied to a specific share. [21]

Glance is an image service and it provides a catalog service to manage data assets used by other services such as images from nova instances and metadata definitions. [22] Metadata definitions are defined by a group of key, value pairs . It includes the property’s name, its description, its value constraints, and the resource that it can be associated to. For instance, a definition of a disk property such as i/o write limit per second (e.g., disk_write_bytes_sec) would have a description and values constrains associated . Users would be able to search through a set of available properties and add it to a flavor from nova with the respective name and value constrains (e.g., integer). [23] Image service provides image discovering, registering and retrieval. This service can be stored in different storage types from a filesystem like NFS to an object-storage like Swift. Glance make these operations available via a REST API.

(38)

users with the platform. Keystone, the identity service of OpenStack, is responsible for user and service authentication and management. It implements an API that provides client authentication, service discovery and distributed multi-tenant authorization. OpenStack establishes several concepts for its identity service which includes tenants, roles, users, groups and domains. A graphical representation between them can be established as represented in Figure 2.5. A tenant, or project, is used to group resources provided by other services such as, for instance, users (keystone), networks(neutron), servers(nova) or images(glance). A user can have access to one or more tenants. A role associated with the user determines the access and level of permissions given to him within a tenant. For example, the user "foo" can have access to tenants A and B, being an "admin" in tenant A and being an "user" in the tenant B. A domain is presented as a collection of users and tenants, enabling OpenStack to isolate resources. This concept is useful when two distinct organizations use the same platform because it eliminates user name conflicts or gives the ability to use different backends for user authentication. [24] [25]

Figure 2.5: Graphical representation of Keystone concepts

OpenStack has a web-based user interface that allows users and administrators to interact with the services described. Horizon, which started as a simple application to manage Nova services, rapidly evolved to support multiple OpenStack projects. Its current architecture includes a core service support that covers the primary services of OpenStack, offering three dashboards for Users, System Administration, and Settings. It is possible to extend core functionalities of the dashboard through official plugins which support services not included by default settings. This is achieved by using third-party plugins or by developing custom functionalities according to the needs (e.g., adding a monitoring panel). [26]

OpenStack offers a cloud orchestration service called Heat, which is capable of 16

(39)

coordinating multiple tasks such as delivering instances, networks and other resources efficiently.

Collecting data usage of physical and virtual resources is very important in cloud environments. The telemetry project provides a set of sub-projects, where one is responsible for different tasks. Ceilometer collects and normalizes data from OpenStack resources, Panko is responsible for event and metadata indexing, and Aodh triggers actions based on pre-defined conditions or events.

Figure 2.6 represents a general overview of OpenStack deployment and some relations between the services, for instance, core services authenticating in identity service or the telemetry service collecting data from other services such as networking, shared file systems, block storage, compute, image storage and object storage services.

Figure 2.6: General overview of an OpenStack service architecture [27]

2.3 Cloud Orchestration

Cloud platforms provide services which are capable of coordinating multiple tasks that were originally defined to deploy a set of virtual resources based on templates. Cloud Orchestration accelerates and simplifies the delivery of instances, networks and storages. These template deployment technologies interact with each service’s API of the platform and follow a specific workflow with the purpose of building and delivering a stack of resources. OpenStack Heat, AWS CloudFormation, Terraform are examples of cloud orchestration solutions.

This section will be used to describe the AWS and OpenStack orchestration solutions.

2.3.1 CloudFormation

AWS CloudFormation is an orchestration service that allows describing and provisioning AWS infrastructure resources. CloudFormation opens the possibility for having an entire infrastructure described in one file, making it possible to establish resource standards within a project or company, and to comply with configurations used among the used

(40)

resources. These templates are reusable, thus making it possible to replicate the same infrastructure in multiple AWS projects. CloudFormation uses human-readable and data-serialization languages such as JSON and YAML to describe resources within a CloudFormation file. [28]

CloudFormation supports a wide variety of AWS resources including EC2, EBS and S3 resources. [29] It not only allows to describe resources but also to establish connections among them, to define conditions or to add macros. For instance, it is possible to describe a S3 instance with an EBS volume attached and mounted in a specific path inside the instance. Both the creation of the volume and the mounting point can have a condition associated. This means that they are only going to be created in certain circumstances. Parameters can be defined in the template of the CloudFormation file, enabling to enter custom values during runtime. CloudFormation also allows using macros. Since macros can be used to transform parts of the template before the actual deployment, they act as pre-processors of the templates. For instance, a macro can be called to substitute variables with custom values, instead of using the parameters functionality of CloudFormation. [30] After the template is deployed, a resulting ‘stack’ includes a collection of the resources described in it. Since CloudFormation establishes a ‘stack’ as a single unit, it ensures that all the resources described in the template are created with success before a stack is successfully created. Furthermore, it ensures the resources needed to delete a stack can be deleted successfully. In the case that a resource fails to be created, CloudFormation rolls the stack back, deleting every resource created until there. If a resource fails to be deleted as part of the deletion process, the remaining resources that belong to the stack are held until they can be finally deleted.

2.3.2 OpenStack Heat

Heat is OpenStack’s implementation of an orchestration engine and it is based on HOT templates (Heat Orchestration Templates). HOT is a new template format which is meant to replace the CloudFormation-compatible format offered by OpenStack. It defines a topological infrastructure for a Cloud application. These templates contain a description of a set of virtual resources that comprises instances, routers, volumes, containers, among others. Heat operates on textual templates in the YAML format which enable specifying the intended virtual resources as well as their configurations and to establish relations between them. In these templates, it is possible to define flavors and images for instances, network subnets for routers, firewall rules (security groups), volumes and containers sizes, attach volumes to instances, connect an instance to a router and specify a security group for it. Heat introduces some concepts to define the template’s description. Resources are one of the concepts used and they specify the objects which are going to be created or modified during the orchestration. After the 18

(41)

deployment ends successfully, Heat creates a stack that has associated all the resources described in the heat orchestration template. [31] [32]

Figure 2.7: OpenStack Heat architecture and workflow

Heat seats between the user interface and the API’s core services, and it has four main components that perform unique functions including heat-cli, heat-api, heatapi-cfn, and heat-engine. The heat-cli component provides a command-line interface client that processes HOT and interacts with the heat’s API. Heat-api and heat-api-cfn are the REST API’s that process the requests from the CLI. Heat-api is the OpenStack’s native API, while heat-api-cfn is a query-API compatible with AWS CloudFormation. The requests processed by these API’s are then processed and sent over RPC to the heat-engine, which is the core component of the OpenStack orchestration service. It is responsible for interacting with the OpenStack API’s to deploy the stack that was described in the HOT provided in the first place. It also provides events back to the API consumer which can be configured in the HOT to be returned to the user (e.g., outputting the IP address of a deployed instance). Figure 2.7 represents the workflow described and the Heat architecture.

Although the OpenStack Orchestration project has deprecated CloudFormation template compatibility in the Icehouse release, one can still use CloudFormations templates. While this compatibility still exists, not all the resources available in AWS are available in OpenStack. Hence, it might not be possible to use all of them in HOT because they can be missing or were not available at all.

(42)

Heat CloudFormation Terraform

Providers supported OpenStack AWS AWS, GCE, Azure, OpenStack, .. GUI Horizon Heat AWS Designer Terraform GUI

Contributions Yes No Yes

Template syntax YAML JSON or YAML HCL or JSON State management No by AWS Within Terraform

Execution control No No Yes

Failure handling Optional

rollback Optional rollback Fix and retry

Table 2.2: Comparison between features provided by Heat, CloudFormation and Terraform

Table 2.2 presents a comparison between OpenStack Heat, AWS CloudFormation and cloud orchestrations solutions available not aforementioned, namely Terraform. Heat only supports OpenStack cloud environments, it is entirely open-source. With Heat it is possible to contribute to the project, while the other solutions or are closed source or require a license to have access to some components as it is the case of Terraform that requires a license for its Dashboard.

Software configuration management tools such as Ansible, Chef and Puppet can be embedded into templates. This improves the integration between infrastructure and software. In other words, it enriches the interaction of OpenStack Heat and the instances by allowing to configure the instance’s software and configurations on different life cycles. Such integration is obtained via Heat Agents, which are python hooks that are installed on the images, thus enabling the provided software configurations to be loaded upon its launch (playbooks, recipes, etc.).

2.4 Containerization

Containerization is a lightweight virtualization technique. It offers an encapsulation mechanism that allows applications to be abstracted from each other and from the actual host machine environment. Running software in containerized environments provides an isolated way to each application while sharing the same operating system with other containerized applications and still obeying to OS-level virtualization principles. Comparing to full virtualization scenario, the same environment can be obtained by provisioning a new virtual machine for all the applications or one VM per application. Such provisioning increases the overhead when comparing to a container solution since

(43)

each VM runs its operating system at the top of the host and containers do not. This specific use case is represented in figure 2.8.

Figure 2.8: Full virtualization (Hypervisor) vs Containerization

2.4.1 Docker

Docker is an open-source platform that facilitates the process of developing, shipping, and running software applications by using containers. It allows applications to be run separately from the host system and to have their execution environment, therefore creating an abstraction and isolation for each shipped application. This level of isolation avoids any conflict between them or dependencies concerns.

Internally, Docker has a server-client architecture which is implemented by three components: a docker client, a docker daemon and a docker registry. The docker daemon is responsible for handling all the operations related to container operations. The docker client can interact with docker daemon via CLI or API to build, run and distribute containers. When a container is built, an image is automatically generated and associated to it, which needs to be stored so it can be reused when needed. Docker registry is the component responsible for storing docker images. These components and their interactions are illustrated in Figure 2.9. Docker client allows to interact with the Docker Daemon by offering operations such as build, push, pull and run, allowing to create Docker images (build), send them to the Docker Registry (push) and later retrieve (pull) and execute them in the intended environment.

(44)

Figure 2.9: Docker Architecture[33]

Docker is an extension of the Linux Containers (LXC) and takes advantage of features from the Linux kernel, namely control groups (cgroups) and kernel namespaces. These are some of the features used by Docker which allow independent containers to run in the same host by sharing a single kernel/OS. Cgroups are a feature of Linux kernel which allows establishing resource limits such as CPU, Memory, I/O to groups of processes. Namespaces wrap a global system resource making it visible only to a group of processes which belong to a specific namespace while at the same time abstracting it from other namespaces.

Another technique found in container operation is the isolation of working directories of the running applications. Change root or chroot is a system call (syscall) used to change the working directory of applications. This isolation is called of chroot jail and it is essential in scenarios such as when an application could be harmful to be tested in the host system or whenever a 32-bit application is tested on 64-bit systems.

Docker containers are built by tying together namespaces, cgroups and chroot, nevevertheless an abstraction layer is needed to make syscalls and kernel features to work together. This task is achieved by libcontainerd, which acts as an interface enabling Docker to access these features and making possible to manipulate namespaces, control groups, selinux policies or apparmor profiles, network interfaces and firewalling rules, all consistently provided as a unique library. As described in figure 2.10, Docker can still take advantage of these tools indirectly via libvirt, LXC or systemd-nspawn. Libcontainerd provides a client layer type that enables platforms to operate containers consistently, including methods for "transferring container images, container execution 22

(45)

and supervision, low-level local storage and network interfaces, across both Linux and Windows." [34]

Figure 2.10: Docker execution environment

Similar to a typical Unix system, docker containers have a boot filesystem (bootfs) as a base layer that is mounted at the start using initial ramdisk (initrd). After a container is started, the bootfs is unmounted freeing up memory. Right after this, Docker adds a new layer for the root filesystem (rootfs) that corresponds to the operating system. This can be, for example, a Debian filesystem as illustrated in figure 2.11. The rootfs remains in read-only mode during during the life of the container. This is opposed to a traditional Linux boot where the rootfs is mounted in read-only mode and after integrity checks being completed it switches to read-write mode. This happens because Docker takes advantage of union mount, which enables to attach more filesystems to rootfs and to appear as if they were just one filesystem. Filesystems are layered on the top of each other and the result is a single filesystem that contains every directory and files from every layer. Docker gives the name of image to each filesystem layer. Images are read-only snapshots of the configurations defined in Dockerfiles. They are incrementally stacked on top of each other, meaning that each image depends on a parent image. This means that a new layer is added on the top-most writable layer. Figure 2.11 represents a container with a Debian filesystem as parent image and on the top of it custom configurations with MySQL packages and configuration files.

(46)

Figure 2.11: Docker container architecture

It is fundamental to establish an abstraction between a docker subnet and the underlying physical network since it enables and facilitates the migration from one platform to another without any modification or concerns about dependencies or network configurations. Whenever a container is started, it is connected to a virtual ethernet device and receives a unique network address. A virtual ethernet device (veth) is a Linux networking interface that establishes a connection between namespaces(i.e. it is used to establish connectivity between containers). Docker names this device as “docker0” by default. All the containers keep connectivity among them by sending

packets to docker0, which in turn forwards every packet through the subnet.

2.5 Container orchestration

Big IT companies have largely adopted container orchestration technologies in produc-tion environments since they address many problems that have emerged from modern distributed/modern systems. These give companies the ability to control the workflow of their software solutions as well as managing the life cycles of containers in dynamic environments. Many of these technologies are distributed via open-source projects, which are based on custom implementations from the companies to fulfill specific needs. Such an approach led to the release of different container orchestration solutions to the market such as for example, Docker Swarm, Kubernetes, Apache Mesos and Marathon. The main propose of these solutions is to create dynamic environments from a large number of hosts aiming to automate tasks such as provisioning and deployment of containers, add redundancy and improve availability, fast scaling of the infrastructure, distribute containers evenly taking into consideration the load of each host, migration of containers in case of failure of the host or the container, allocation of resources for each deployed container and health monitoring of containers.

(47)

2.5.1 Docker Swarm

Docker swarm is a tool that enables clustering and scheduling in container orchestration environments. It allows the deployment of multi-containerized applications on several hosts configured in swarm mode instead of the deployment of several containers in a single host. Starting version 1.12 , Docker built-in clustering and orchestration layers using swarmkit. [35] [36] Such integration enables creating a cluster from a set of already established Docker hosts, therefore achieving the so-called “swarm”. Figure 2.12 represents a generic overview of a swarm cluster built from hosts running Docker Engine by setting them to run in Swarm mode with the respective roles as managers and workers. These concepts that will be later explained. [37]

Figure 2.12: Docker Swarm cluster

Nodes can have different roles. They can perform the roles of a manager, a worker, or both. Managers are responsible for keeping the integrity of the swarm by managing memberships of the participating nodes and by delegating tasks to workers. By default, tasks can be dispatched to managers by turning a node in a manager and a worker simultaneously. Tasks are defined in terms of services in the swarm and can run in two distinct modes: replicated and global modes. In replicated mode, a service has tasks running in several nodes based on the number of replica tasks specified as in a global mode, it runs one task of the service per node available.

Nodes

As aforementioned, there are two types of nodes in a Docker Swarm environment: managers and workers. The priority task of manager nodes is to handle the management of the cluster by maintaining its state and dispatching services to the other nodes. They also serve an HTTP API for the Docker swarm mode. Manager nodes implement a

(48)

distributed consensus algorithm to manage the global state of a swarm cluster called Raft. Its implementation in the Docker Swarm enables a cluster to elect a leader among the available managers, and guarantees that each node agrees on a shared state upon a certain number of iterations (i.e., managers of the swarm share the same state during their life cycle). All the information and configurations needed to maintain such integrity are stored locally in each manager in a database. For instance, in the case that the leader manager fails, any other manager belonging to the swarm can pick up and restore the desired state by rescheduling services across the nodes of the cluster and keeping it up and running with no downtime. [38]

Swarm mode offers a fault tolerance feature, which is directly related to the number of managers implemented. To take advantage of this feature, it is recommended to implement an odd number of managers. By doing so it allows the recovery of the cluster from a failure and reduces the downtime close to zero. For an N number of managers, the cluster should be able to tolerate a loss of at most (N-1)/2 managers, meaning that a cluster with 3 managers only tolerates a loss of one manager. The number of implemented managers of a cluster should always be taken into consideration since it interferes with fault tolerance and performance aspects. More managers might not mean an increased performance, and fewer managers might not be tolerant to losses in the management quorum. Docker establishes seven as a maximum number of managers belonging to a cluster.

Worker nodes act as slaves to managers. They do not need a leader to operate, which means that they do not participate in any Raft algorithm. Their sole purpose is to receive and schedule tasks from managers and execute services. At the time of creation, each manager can have the same role as a worker, but this option can be changed so that managers do not accept tasks. A manager in Drain mode does not allow to schedule on itself or does not accept tasks as the opposite of Active mode, so if the managers of a cluster should not accept tasks, they need to be set to Drain mode, this way workers are the only nodes capable of running tasks attributed by managers. Figure 2.13 establishes the relations between the two groups of a Swarm cluster: Manager nodes forming a group responsible for managing and keeping the integrity of the cluster. The management group ensures its integrity by sharing information among the nodes and through a RAFT consensus algorithm; Worker nodes forming the working group which receives the scheduled workloads from the management group. When one node receives a workload, it acknowledges the remaining worker nodes through a network dedicated to this group called Gossip network.

(49)

Figure 2.13: Docker Swarm overview [39]

Considering that it is possible for a manager to play two roles, both as a manager and as a worker, it is also possible to promote a worker to the role of manager. Such a situation can be useful when a manager fails, and this can be achieved by promoting a worker to its position. Demoting a node can also be made in swarm mode. For example, in the previous scenario, after the maintenance of the old manager is complete, the newly promoted manager can be demoted to its old role, so that the former manager can re-join the quorum of managers. [39]

Services and tasks

Service is an abstraction used by Docker to define tasks that are distributed among the available nodes of the cluster. Each service contains a specification for a task indicating what image it should be based on and which options it should include such as for example, ports to be exposed outside the swarm, network to be connected to, number of replicas, mounting points or even resources limits.

At the definition of the service, a manager starts to accept a service definition to the swarm, and then it schedules it to the available nodes depending on the number of replicas defined.

Figure 2.14 represents a diagram of a service running with 3 replicas, meaning that 3 tasks are running independently in the available nodes. A task is a mechanism used by swarm to invoke containers in the assigned node. Tasks progress through different states. A typical startup cycle initiates in NEW state, passes through several states, and if everything went as expected, it remains in RUNNING state.

(50)

Figure 2.14: Docker Swarm services and tasks relation diagram [40]

Figure 2.15 presents the workflow of a service creation and the respective task scheduling to a worker node. A service specification is sent to a manager via swarm API that creates an object for it. This operation can be done by using CLI tools or HTTP API. Orchestrator then compares the current state against the desired state through a reconciliation loop and creates the tasks for the service object. Finally, an IP address is associated with the task and it is assigned to an available node. Workers are continually reporting to managers and checking on assigned tasks, while managers are continually doling out new tasks and evaluating and comparing the states of the tasks. If any problem occurs, such as a worker being down or a manager receiving a new service, the orchestrator starts a new reconciliation loop to achieve the desired state.

Figure 2.15: Docker Swarm service scheduling [40]

(51)

Services can be deployed across the available nodes using two models: replicated and global modes. In the replicated mode, services have a defined number of replicas, which means that for each replica one task is distributed to each available node. If a service is deployed in global mode, it creates a task in each node of the cluster (i.e., if the cluster has 10 nodes, the manager assigns 10 tasks for the global service associated). [40]

Some services have to be publicly accessed, which means that they need to be available outside the swarm. Swarm allows to expose services at specific ports and it is achieved by using an ingress load balancing technique (i.e., a swarm manager assigns a port to a service, with a default or manually specified port, and every request generated from the outside of the swarm to that published port is load balanced and forward to the respective service). As illustrated in figure 2.16, a published service does not need to be running in a specific node to be accessed. Every host in the ingress routing mesh is aware that this service is deployed at a specific port. It can be thus accessed via any of the managers of the swarm.

Figure 2.16: Docker Swarm load balancing [41]

Security

Docker has a built-in public key infrastructure (PKI) which secures the transfer of information within a swarm cluster. Nodes authenticate, authorize and establish encrypted communications among them by using Transport Layer Security (TLS). At the time of the swarm initialization, Docker issues a new root Certificate Authority (CA) along with a public and a private key to secure communications with the nodes that integrate the swarm. Docker also supports custom CA’s which can be specified in this step. After the manager election in Raft consensus , the leader generates two tokens, one for the managers’ group and another for the workers’ group. These tokens are used when a new node joins one of these groups. A token is composed of a specific signature

Platform to support the development of IoT solutions

SÉRGIO FILIPE

MARQUES MANSO

Plataforma de apoio ao desenvolvimento de

soluções IoT

SÉRGIO FILIPE

MARQUES MANSO

Plataforma de apoio ao desenvolvimento de

soluções IoT

Platform to support the development of IoT

solutions

SÉRGIO FILIPE

MARQUES MANSO

Plataforma de apoio ao desenvolvimento de

soluções IoT

Platform to support the development of IoT

solutions

Contents

List of Figures

List of Tables

Acronyms

CHAPTER

1

Introduction

CHAPTER

2

State of the Art