SmartEdge: fog computing cloud extensions to support latency-sensitive IoT applications

Texto

(1)F EDERAL U NIVERSITY OF R IO G RANDE DO N ORTE C ENTER OF E XACT AND E ARTH S CIENCES D EPARTMENT O F I NFORMATICS AND A PPLIED M ATHEMATICS P OST- GRADUATION P ROGRAM IN S YSTEMS AND C OMPUTING M ASTER ’ S IN S YSTEMS AND C OMPUTING. SmartEdge: Fog Computing Cloud Extensions to Support Latency-Sensitive IoT Applications. Flávio de Sousa Ramalho. Natal-RN December, 2016.

(2) Catalogação da Publicação na Fonte. UFRN / SISBI / Biblioteca Setorial Centro de Ciências Exatas e da Terra – CCET.. Ramalho, Flávio de Sousa. SmartEdge: fog computing cloud extensions to support latency-sensitive IoT applications / Flávio de Sousa Ramalho. - Natal, 2016. 110f.: il. Orientador: Prof. Dr. Augusto José Venâncio Neto. Dissertação (Mestrado) – Universidade Federal do Rio Grande do Norte. Centro de Ciências Exatas e da Terra. Programa de Pós-Graduação em Sistemas e Computação. 1. Computação em névoa. 2. Computação em nuvem. 3. Internet das coisas. 4. Virtualização por containers. 5. Redes definidas por software. 6. Fog Computing. 7. Cloud Computing. 8. Internet of Things. 9. Container-based virtualization. 10. Software-defined networking. I. Venâncio Neto, Augusto José. II. Título. RN/UF/BSE-CCET. CDU: 004.2.

(3) Flávio de Sousa Ramalho. SmartEdge: Fog Computing Cloud Extensions to Support Latency-Sensitive IoT Applications. Master’s Defense Examination submitted to the Graduate Program in Systems and Computing, Department of Computer Science and Applied Mathematics at Federal University of Rio Grande do Norte as a requirement for a Master’s Degree in Computer Systems. Research Area: Integrated and Distributed Systems. Supervisor. Prof. Dr. Augusto José Venâncio Neto. PP G SC – G RADUATE P ROGRAM IN C OMPUTER S YSTEMS DIMA P – D EPARTMENT OF I NFORMATICS AND A PPLIED M ATHEMATICS CCET – C ENTER OF E XACT AND E ARTH S CIENCES UFRN – F EDERAL U NIVERSITY OF R IO G RANDE DO N ORTE. Natal-RN December, 2016.

(4) Agradecimentos. Este trabalho de dissertação foi orientado pelo professor Augusto Neto, o qual agradeço pelos diversos ensinamentos, horas e horas de reunião, cobranças e enorme paciência. Agradeço também aos colegas do Laboratório de Sistemas Ubíquos e Pervasivos (UPLab), pelas incontáveis conversas e sábios conselhos que foram essenciais para a concepção desse trabalho. Agradeço profundamente a minha família, em especial aos meus pais Francisco de Sousa Ramalho e Deolinda Maria de Sousa Ramalho. Agradeço aos meus amigos, mesmo que a ajuda não tenha sido diretamente relacionada ao conteúdo científico desse trabalho. Sempre estiveram comigo, comemorando os momentos de vitória e também dando suporte nos momentos difíceis e de tristeza. Esse trabalho é dedicado aos meus pais, que despertaram em mim a ambição que resultou na minha busca por uma pós-graduação e foi capaz de prover os meus estudos através de muito amor e trabalho árduo. Eles são os maiores responsáveis por quem sou e pela pessoa que sonho um dia me tornar..

(5) SmartEdge: Fog Computing Cloud Extensions to Support Latency-Sensitive IoT Applications. Author: Flávio de Sousa Ramalho Supervisor: Prof. Dr. Augusto José Venâncio Neto. A BSTRACT The rapid growth in the number of Internet-connected devices, associated to the increasing rates in popularity and demand for real-time and latency-constrained cloud application services makes the use of traditional cloud computing frameworks challenging to afford such environment. More specifically, the centralized approach traditionally adopted by current Data Center (DC) pose performance issues to suit a high density of cloud applications, mainly in terms to responsiveness and scalability. Our irreplaceable dependency on cloud computing, demands DC infrastructures always available while keeping, at the same time, enough performance capabilities for responding to a huge amount of cloud application requests. In this work, the applicability of the fog computing emerging paradigm is exploited to enhance the performance on supporting latency-sensitive cloud applications tailored for Internet of Things (IoT). With this goal in mind, we introduce a new service model named Edge Infrastructure as a Service (EIaaS), which seeks to offer a new edge computing tailored cloud computing service delivery model to efficiently suit the requirements of the real-time latency-sensitive IoT applications. With EIaaS approach, cloud providers are allowed to dynamically deploy IoT applications/services in the edge computing infrastructures and manage cloud/network resources at the run time, as means to keep IoT applications always best connected and best served. The resulting approach is modeled in a modular architecture, leveraging both container and Software-Defined Networking technologies to handle edge computing (CPU, memory, etc) and network resources (path, bandwidth, etc) respectively. Preliminary results show how the virtualization technique affects the performance of applications at the network edge infra. The container-based virtualization takes advantage over the hypervisor-based technique for deploying applications at the edge computing infrastructure, as it offers a great deal of flexibility under the presence of resource constraints. Keywords: Fog Computing, Cloud Computing, Internet of Things, Container-based Virtualization, Software-Defined Networking..

(6) SmartEdge: Extensões de Nuvem para Computação em Névoa para Suportar Aplicações IoT Sensíveis a Latência. Autor: Flávio de Sousa Ramalho Orientador: Prof. Dr. Augusto José Venâncio Neto. R ESUMO O rápido crescimento do número de dispositivos conectados à Internet, associado às taxas crescentes de popularidade e demanda de aplicações e serviços em tempo real na nuvem, com restrições de latência, torna muito difícil para estruturas de computação em nuvem tradicionais acomodá-los de forma eficiente. Mais especificamente, a abordagem centralizada adotada tradicionalmente por Data Centers (DC) atuais apresentam problemas de desempenho para atender de aplicações em nuvem com alta densidade, principalmente quanto a capacidade de resposta e escalabilidade. Nossa dependência insubstituível por computação em nuvem, exige infra-estruturas de DCs sempre disponíveis, enquanto mantém ao mesmo tempo capacidades de desempenho suficientes para responder a uma enorme quantidade de solicitações de aplicativos em nuvem. Neste trabalho, a aplicabilidade do emergente paradigma de computação em névoa é explorada para melhorar o desempenho no suporte de aplicações em nuvem sensíveis à latência voltadas a Internet das Coisas (do inglês Internet of Things - IoT). Com base neste objetivo, apresentamos o novo modelo denominado Infraestrutura de Borda como um Serviço (do inglês Edge Infrastructure as a Service - EIaaS), que procura oferecer um novo modelo de computação em nuvem com serviço de entrega baseado em computação de borda voltado a atender de forma eficiente as exigências de aplicações IoT em tempo real sensíveis à latência. Com a abordagem EIaaS, provedores de nuvem podem implantar dinamicamente aplicações/serviços IoT diretamente nas infra-estruturas de computação de borda, nem como gerir seus recursos de núvem/rede em tempo de execução, como forma de manter as aplicações IoT sempre melhor conectadas e melhor servidas. A abordagem resultante é arquitetada em uma estrutura modular, tendo como base tecnológica ferramentas de Rede Definida por Software (do inglês, SoftwareDefined Networking - SDN) para lidar com recursos de computação de borda (CPU, memória, etc.) e de rede (caminhos, largura de banda, etc.), respectivamente. Os resultados preliminares.

(7) mostram como as principais técnicas de virtualização utilizadas no âmbito deste trabalho, afetam o desempenho das aplicações na infraestrutura de borda da rede. A virtualizaçào por containers leva vantagem sobre a técnica de virtualização por máquinas virtuais para implantar aplicações na borda da rede, uma vez que oferece grande flexibilidade mesmo na presença de demanda de recursos. Palavras-chave: Computação em Névoa, Computação em Nuvem, Internet das Coisas, Virtualização por Containers, Redes Definidas por Software..

(8) List of Pictures. 1. Cloud-computing service models and responsibilities. . . . . . . . . . . . . .. p. 27. 2. Cloud-computing deployment models representation. . . . . . . . . . . . . .. p. 29. 3. Fog-computing Architecture . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 32. 4. (a) Virtual Machine and (b) Container isolation layers. . . . . . . . . . . . .. p. 36. 5. Docker app scheduling steps . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 41. 6. Software-Defined Networking APIs . . . . . . . . . . . . . . . . . . . . . .. p. 43. 7. SmartEdge Stack deployed on a OpenStack Cloud Infrastructure . . . . . . .. p. 55. 8. SmartEdge Modular Architecture . . . . . . . . . . . . . . . . . . . . . . . .. p. 58. 9. SmartEdge Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 61. 10. SmartEdge usage workflow . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 69. 11. Sequence diagram of adding a node to the SmartEdge cluster . . . . . . . . .. p. 71. 12. Sequence diagram of deploying an application on SmartEdge . . . . . . . . .. p. 72. 13. Linpack results on each platform over 15 runs, with N=2000 . . . . . . . . .. p. 78. 14. Disk throughput results from running Bonnie++ using a file size of 3 GiB . .. p. 79. 15. Disk rnrw from SysBench using a file size of 3 GiB . . . . . . . . . . . . . .. p. 80. 16. Disk throughput from DD using a file size of 3 GiB and a block of 1024b . .. p. 81. 17. Network throughput results from running netperf during 600 seconds . . . . .. p. 82. 18. Network request/response results from running netperf during 600 seconds . .. p. 83. 19. SmartEdge Evaluation Testbed . . . . . . . . . . . . . . . . . . . . . . . . .. p. 85. 20. Impact of latency on a simple Request/Response of 1 byte . . . . . . . . . .. p. 90. 21. Impact of latency on the application FPS by its provisioning platform. Only the application deployed using SmartEdge presented a high QoE . . . . . . .. p. 92.

(9) 22. CPU usage over the different deployment scenarios. . . . . . . . . . . . . . .. 23. Impact of CPU allocation on the application performance. Confidence interval for the mean of the values, with a confidence level of 95% . . . . . . . .. p. 93. p. 95.

(10) List of Tables. 1. Docker Swarm API (Container Operations) . . . . . . . . . . . . . . . . . .. p. 39. 2. IoT Platforms Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 52. 3. SmartEdge Authentication API . . . . . . . . . . . . . . . . . . . . . . . . .. p. 66. 4. SmartEdge’s authentication method format . . . . . . . . . . . . . . . . . . .. p. 66. 5. SmartEdge’s account management API . . . . . . . . . . . . . . . . . . . . .. p. 66. 6. SmartEdge’s network API . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 67. 7. SmartEdge’s node management API . . . . . . . . . . . . . . . . . . . . . .. p. 67. 8. SmartEdge’s registry management API . . . . . . . . . . . . . . . . . . . . .. p. 67. 9. SmartEdge’s event management API . . . . . . . . . . . . . . . . . . . . . .. p. 68. 10. CPU Benchmark: NBench . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 76. 11. CPU/Scheduler Benchmark: SysBench . . . . . . . . . . . . . . . . . . . . .. p. 77. 12. Disk rnrw Benchmark: Bonnie++ . . . . . . . . . . . . . . . . . . . . . . . .. p. 79. 13. Memory Benchmark: STREAM . . . . . . . . . . . . . . . . . . . . . . . .. p. 81. 14. Application Provisioning Time . . . . . . . . . . . . . . . . . . . . . . . . .. p. 89.

(11) List of Abbreviations and Acronyms. 3G – Third Generation 4G – Fourth Generation API – Application Programming Interface AWS – Amazon Web Service BLE – Bluetooth Low Energy CapEx – Capital Expenses CDN – Content Delivery Networks CM – Configuration Management CSP – Cloud Service Provider DC – Data Center DCN – Data Center Networks EC2 – Elastic Cloud Computing EIaaS – Edge Infrastructure as a Service EPA – Environmental Protection Agency HOT – Heat Orchestration Template HTTP – Hypertext Transfer Protocol IaaS – Infrastructure as a Service ICT – Information and Communications Technology IoT – Internet of Things IP – Internet Protocol IT – Information Technology.

(12) ITU – International Telecommunication Union KVM – Kernel-based Virtual Machine LTE – Long-Term Evolution LXC – Linux Containers M2M – Machine to Machine MPLS – Multiprotocol Label Switching NAT – Network Address Translation NIC – National Intelligence Council ODL – OpenDaylight ONF – Open Networking Foundation OpEx – Operating expenses OS – Operational System OSI – OpenSource Initiative PaaS – Platform as a Service QoS – Quality of Service REST – Representational State Transfer RFID – Radio Frequency IDentification SaaS – Software as a Service SDN – Software-Defined Networking SLA – Service Level Agreement SNS – Simple Notification Service SOA – Service Oriented Architecture SoC – System on Chip TCO – Total Cost of Ownership U.S. – United States VM – Virtual Machine.

(13) VPC – Virtual Private Cloud VPN – Virtual Private Network WiFi – Wireless Fidelity WSN – Wireless Sensor Networks.

(14) Contents. 1. 2. Introduction. p. 15. 1.1. Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 17. 1.1.1. Specific Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 18. 1.2. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 19. 1.3. Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 20. 1.4. Work Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 20. Theoretical Background. p. 21. 2.1. Smart Cities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 21. 2.2. Internet of Things . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 22. 2.3. Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 24. 2.3.1. Service Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 26. 2.3.2. Deployment Models . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 27. 2.3.3. Data Center Bottleneck . . . . . . . . . . . . . . . . . . . . . . . . .. p. 28. Fog Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 29. 2.4.1. Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 31. 2.4.2. Applicability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 32. Virtualization at the Network Edge . . . . . . . . . . . . . . . . . . . . . . .. p. 33. 2.5.1. Hypervisor-Based Virtualization . . . . . . . . . . . . . . . . . . . .. p. 33. 2.5.2. Container-Based Virtualization . . . . . . . . . . . . . . . . . . . . .. p. 34. 2.5.3. Virtual Machines vs. Containers . . . . . . . . . . . . . . . . . . . .. p. 35. 2.4. 2.5.

(15) 2.6. 3. 4. 2.5.4. Resource Allocation in Virtual Machines and Containers . . . . . . .. p. 36. 2.5.5. Docker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 37. 2.5.5.1. Clustering . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 38. 2.5.5.2. Networking . . . . . . . . . . . . . . . . . . . . . . . . .. p. 40. 2.5.5.3. Scheduling, Cluster Management, and Orchestration . . . .. p. 40. Software-Defined Networking . . . . . . . . . . . . . . . . . . . . . . . . .. p. 42. 2.6.1. API Standardization . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 43. 2.6.2. Flow-Based Control . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 44. 2.6.3. Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 44. 2.6.4. Software-Defined Networking and Internet of Things . . . . . . . . .. p. 45. Related Works. p. 47. 3.1. Container-based Softwarized Control Plane . . . . . . . . . . . . . . . . . .. p. 47. 3.2. Network-based Softwarized Control Plane . . . . . . . . . . . . . . . . . . .. p. 49. 3.3. Key Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 50. Work Proposal. p. 54. 4.1. SmartEdge Key Design Principles . . . . . . . . . . . . . . . . . . . . . . .. p. 54. 4.1.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 54. 4.1.2. Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 57. Design and Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 59. 4.2.1. Application Programming Interface . . . . . . . . . . . . . . . . . .. p. 65. 4.2.1.1. Authentication . . . . . . . . . . . . . . . . . . . . . . . .. p. 65. 4.2.1.2. Accounts . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 66. 4.2.1.3. Network . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 66. 4.2.1.4. Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 67. 4.2.1.5. Registries . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 67. 4.2.

(16) 4.2.1.6. Events . . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 67. Usage Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 68. Edge-Infrastructure-as-a-Service . . . . . . . . . . . . . . . . . . . . . . . .. p. 72. 4.2.2 4.3 5. Evaluation. p. 74. 5.1. Preliminary Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 74. 5.1.1. Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 74. 5.1.2. Benchmark Results . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 75. 5.1.2.1. CPU Benchmark . . . . . . . . . . . . . . . . . . . . . . .. p. 75. 5.1.2.2. Disk I/O Benchmark . . . . . . . . . . . . . . . . . . . . .. p. 78. 5.1.2.3. Memory Benchmark . . . . . . . . . . . . . . . . . . . . .. p. 81. 5.1.2.4. Network Benchmark . . . . . . . . . . . . . . . . . . . . .. p. 82. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 83. SmartEdge Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 84. 5.2.1. Evaluation Scenario . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 84. 5.2.2. Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 87. 5.2.3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. p. 89. 5.2.3.1. Provisioning time . . . . . . . . . . . . . . . . . . . . . .. p. 89. 5.2.3.2. Latency Impact on Request/Response . . . . . . . . . . . .. p. 90. 5.2.3.3. Impact of latency on the application QoE and server CPU .. p. 91. 5.2.3.4. Impact of resource allocation on the client application QoE. p. 93. 5.1.3 5.2. 6. Conclusion and Future Work. References. p. 96 p. 99.

(17) 15. Chapter 1 Introduction. The future era of computing intends to change the way humans interact with technology. The technological revolution that has taken place in recent decades, driven by advances and developments in Information and Communication Technologies (ICT) has revolutionized the way people communicate, work, travel, live, etc. Cities are evolving towards intelligent dynamic infrastructures that serve citizens fulfilling the criteria of energy efficiency and sustainability [1]. The society needs to address new challenges in order to minimize the consumption of natural energy resources, promote renewable energy and reducing CO2 emissions to the atmosphere, in these highly populated towns and new urban areas. The Smart City [2] concept is a powerful tool to address this urban change that must be able to efficiently manage infrastructure and services, while meeting the needs of the city and its citizens. Focusing on the technology needed to build smart cities, the developments made in the field of ICT plays a key role in the creation and development of smart cities. The analysis of Forrester Research in [3] describes a smart city system that "uses information and communication technologies in order to make critical components of the infrastructure and services of a city, more interactive, accessible and effective". Therefore, the creation of a smart city is not restricted to provide services independently and individually, but it will be necessary to deploy a whole infrastructure for efficient city data collection, transmission, storage and analysis to supply services for the citizens. That is where the Internet of Things paradigm together with Cloud Computing comes into play. The Internet of Things (IoT) paradigm [4] offers an environment where many of the objects around us will be seamless on the network. Technologies like wireless sensor networks (WSN) [5] and Radio Frequency IDentification (RFID) [6] are already rising to overcome these challenges, in which information and communication systems are invisibly embedded in the environment around us. The enormous amounts of data generated by these devices need to be stored, processed and presented efficiently, as well as in an easy interpretable way. To accommo-.

(18) 16. date such type of computing, Cloud Computing [7] is an asset, a well-known and widely-used virtual infrastructure providing sensation of infinite resource availability and varying hub of services. For instance, a traditional Cloud Computing service infrastructure is likely to provision monitoring devices, storage devices, analytic tools, visualization platforms and client delivery. The cost based model that Cloud Computing offers will enable end-to-end service provisioning for businesses and users to access applications on demand from anywhere [8]. The general model of cloud computing is based on centralized Data Center (DC) architectures, which are treated as the whole monopolized hub of services addressing all computation and storage capabilities. In the current cloud-based frameworks, all application requests and resource demands are processed and handled inside central server farms. However, the increasing popularity and penetration of Internet-connected devices, drove by the innovation of the IoT paradigm, has imposed many challenges in the performance of cloud DCs to handle the huge amount of information that is expected to be generated. In 2012, global commercialization of IoT-based application systems generated a revenue of $4.8 trillion [9], Cisco estimates that the global corporate profits will also increase approximately by 21%, just because of the adoption of IoT [10], it is estimated that by 2020, around 80 billion devices will be connected to the Internet [11]. Thus, to provide computing and storage to these devices, the cloud DCs needs to adopt new approaches to manage such heterogeneous scenarios. The innovation of IoT is dependent on the advances in cloud computing. Data from the billions of Internet-connected devices are voluminous and demand to be processed and stored within the cloud DCs. However, most of the IoT applications, such as smart vehicular traffic management systems, smart driving and car parking systems, and smart grids are observed to demand real-time, low-latency services from the service providers [12]. As the traditional cloud computing system involves both processing and storage of data carried out in a centralized way within DC’s infrastructure, the increasing amount of IoT data traffic is likely to exponentially damage the performance of cloud service applications, mainly in regards to high rates of latency and poor Quality of Service (QoS) afforded by network bottlenecks. The aforementioned performance issues of centralized Cloud Computing infrastructures in affording latency-sensitive IoT service applications motivated the research community in searching for alternative solutions. In this scope, the Fog Computing [12] concept arises as an asset, by enabling applications on billions of connected devices, already connected in the Internet of Things (IoT), to run directly at the network edge [13]. As Fog computing is implemented at the edge of the network, it provides low latency, location awareness, and improves quality-ofservices (QoS) for streaming and real time applications. Typical examples include transportation.

(19) 17. and networks of sensors and actuators. Moreover, this new infrastructure supports heterogeneity as Fog devices include end-user devices, access points, edge routers and switches. The Fog paradigm is well positioned for real time big data analytics, supports densely distributed data collection points, and provides advantages in entertainment, advertising, personal computing and other applications [14]. Although the high capabilities of Fog Computing in providing a low-latency infrastructure, its deployment is not trivial. Most of the works on fog computing have primarily focused on the principles, basic notions, and the doctrines of it. Not many works have contributed in the technical aspect of the paradigm in terms of an implementation perspective. In this work, we propose a platform for provisioning resources at the network edge, named SmartEdge. The platform seeks to provide a new service model, named Edge Infrastructure as a Service (EIaaS), based on the recent computing paradigm, fog computing, to afford the demands of real time latency-sensitive applications in the context of IoT. Fog computing was firstly introduced at [12] as a new paradigm focused on the infrastructure for Internet of Things applications. The idea of Fog computing has emerged to enable data distribution and placing it closer to the end-user, thus reducing service latency, improving Quality of Service and removing other possible obstacles between the user and the service. For the sake of completeness it is important to state that the logic to decide when the application should use the cloud/fog provided resources is out of the scope of this work. The scope of SmartEdge is on the infrastructure level only and it is expected that these types of decision making functionalities are considered at the application logic level.. 1.1. Objectives. The main objective of the SmartEdge platform is to offer on-demand access to resources provided at the network edge, to serve real-time applications demanding low response time (e.g. live streaming, smart traffic monitoring, smart parking etc.). However, for the requests which demand semi-permanent and permanent storage or require extensive analytics involving historical data-sets (e.g. social media data, photos, videos, medical history, data backups etc.) that still require to use the standard IaaS provided by the cloud, the edge devices would act as routers or gateways to redirect the requests to the core cloud computing framework..

(20) 18. 1.1.1. Specific Objectives. On the basis of the aforementioned main objective of this dissertation, the list of specific objectives required for fully carrying out this work are provided in the following: 1. Propose new strategies to deal with the deployment of latency-sensitive applications. 2. Implement a Heat Orchestration Template (HOT) to enable the easy deployment of the proposed solution on a cloud provider; 3. Design and implement a mechanism to enable the deployment of applications, in the form of containers, on any compatible device, typically network edge devices, in order to extend the cloud services to the edge of the network; 4. Design and implement a mechanism to provide network functions to interconnect in a controlled and efficient way the deployed containers and the resources available on the cloud; 5. Propose the EIaaS model, a new service model that takes advantage of the SmartEdge platform by enabling operators to use their network edge devices as an extension to their cloud infrastructure; 6. Integrate all the hereinabove mechanisms to feature the SmartEdge framework; 7. Design and develop a reference software architecture that instantiates the SmartEdge framework proposal; 8. Implement a prototype in a real testbed featuring the SmartEdge reference architecture and its EIaaS model, in order to afford benchmarking suitability and performance aspects in the context of the IoT applications. The benchmarking considers analysis in the runtime statistics that are collected while running IoT applications in the testbed, over typical cloud-enabled frameworks. On the basis of the different types of applications constraints on the IoT environment, and associated to the fact that fog computing is a subfield of cloud computing, it is important to highlight that the EIaaS interworks with the IaaS by complementing one another, not replacing. The complementary functions of EIaaS along with IaaS, offer to the end-users a new option of service delivery model able to fulfill the requirements of real-time low-latency IoT applications running at the network edges. Moreover, complex analytics and long-term data storage at the core cloud framework can be afforded..

(21) 19. 1.2. Motivation. The main source of storage and computing of the cloud computing architecture are the dispersed DCs which communicate among themselves through their Data Center Networks (DCN). In that way, most of the internet traffic are centralized in these DCNs, thus creating a huge bottleneck for latency-sensitive applications. Motivated by the concepts of fog computing proposed by Bonami et al. [12], in this work, we propose and develop a platform to provide EIaaS for the fog paradigm and assess its performance while supporting the IoT requirements. With the increase in the number of the IoT devices demanding real-time services from the providers, the traditional cloud computing framework is expected to face the following key challenges: 1. Nowadays, connected devices have already reached 9 billion and are expected to grow more rapidly and reach 80 billion by 2020 [11]. With this tremendous increase in the number envisioned IoT devices, the DCNs will be imposed to a heavy network traffic demand, thus affecting their capability to suit low-latency application requirements. As a consequence, real-time applications will experience quality degradation in the network service transport, and thus QoE, specially at wireless communications. 2. According to a report [15] from the U.S. Environmental Protection Agency (EPA), in the year of 2006, DCs have been identified as one of the fastest growing consumers of energy, the DCs of U.S. consumed about 61 billion kilowatt-hours of power, which represents 1.5% of all power consumed in the U.S. and result in a total financial expenditure worth $4.5 billion. It was also observed that in 2007, 30 million worldwide servers were accounted for 100 TWh of the world’s energy consumption at a cost of $9 billion which is expected to rise up to 200 TWh in the next few years [15], [16]. Therefore, it is important to exempt the cloud DCs from being bombarded with service requests, and serve a part of those requests from the network edge. This would relax the load experienced by the DCs and also would serve the latency-sensitive application requests in a better way with increased QoS. 3. Many works have contributed establishing the principles, basic notions, and the doctrines of Fog Computing. However its deployment, suitability and technical aspects are still to be experimented. Considering the motivations listed in this section, next one aims at highlighting the main contributions that are envisioned through carrying out this work..

(22) 20. 1.3. Contribution. As mentioned before, EIaaS is an extension to the standard IaaS offered by the cloud computing providers. Rather, in this work, we analyze the development and suitability of EIaaS combined with the traditional IaaS in supporting the ever-increasing demands of the latencyhungry IoT-based applications. The expected contributions of this work are listed below. 1. Initially, this work constructs the architectural model of the proposed SmartEdge platform, based on the concepts of fog computing – one of the first attempts of its kind in this direction. We define the different modules and links within the cloud computing architecture and explain the communication exchange pattern between them. 2. Based on this model, a prototype will be developed and deployed on a real test-bed, using devices such as set-top-box, desktops and notebooks as edge devices. 3. The prototype will characterizes the performance metrics of the proposed EIaaS in terms of the service latency, performance and scalability. Also, will provide a technical view of the challenges involved on the deployment of the fog computing paradigm. 4. The work also performs a fair and equitable comparative study for both IaaS and EIaaS models. We analyze the suitability of the EIaaS service type to support the demands of IoT devices while serving latency-sensitive applications.. 1.4. Work Organization. This work is organized as follows. Firstly, Chapter 2 provides a Theoretical Background over technologies concerned in the proposed work as well as how they relate with each other and with the proposed work. Chapter 3 presents related works where each solution is described and compared according to some elicited features. Chapter 4 introduces the proposed work and its architecture, as well as describe the new EIaaS service type proposed. Preliminary results about virtualization techniques are provided at Chapter 5. Also on Chapter 5, the benchmarking results and analysis of the solution is presented. Finally, Chapter 6 provides the conclusion and future works to be considered..

(23) 21. Chapter 2 Theoretical Background. This chapter aims at providing the concepts related to the main fields of research that are considered in this dissertation, with emphasis in foundations, architectures and its relations to the proposed work.. 2.1. Smart Cities. A Smart City [2] is an urban system that uses information and communication technology (ICT) to make both its infrastructure and its public services more interactive, more accessible and more efficient. A Smart City is a city committed to its environment, both environmentally and in terms of the cultural and historical elements, and where the infrastructure is equipped with the most advanced technological solutions to facilitate citizen interaction with urban elements. The origin of Smart Cities is mainly based on two factors. Firstly, the increase in world population and its growing migration from rural areas to urban centers, the urban population is forecasted to reach 70% by 2050 [17]. Secondly, there is a concern about the shortage of natural resources, which may compromise in the coming years the global supply to the world population, along with concerns about the environment and climate change [18]. The society needs to address new challenges in order to minimize the consumption of natural energy resources, promote renewable energy and reducing CO2 emissions to the atmosphere, in these highly populated towns and new urban areas [19]. The Smart City concept is a powerful tool to address this urban change that must be able to efficiently manage infrastructure and services, while meeting the needs of the city and its citizens. Focusing on the technological scenario, ICTs together with local governments and private.

(24) 22. companies, play a key role for implementing innovative solutions, services and applications to make smart cities a reality. The Internet of Things paradigm is playing a primary role as an enabler of a broad range of applications, both for industries and the general population [20]. The increasing popularity of the IoT concept is also due to the constantly growing number of very powerful devices like smartphones, tablets, laptops and lower powerful devices like sensors that are able to join the Internet. In the context of Smart Cities, it makes sense to consider the scenario of the various different and heterogeneous devices, the Wireless Sensor Networks (WSN) interconnected to each other and to exploit these "interconnections" to activate new type of services. The ICT trends suggest that the sensing and actuation resources can be involved in the Cloud and solutions for the convergence and evolution of IoT and cloud computing infrastructures arise [20]. Nevertheless, there are some challenges that need to be faced such as: 1) the interoperability among different ICT systems; 2) a huge amount of data to be processed and provided in real-time by the IoT devices deployed in the smart systems; 3) the significant fragmentation deriving from the multiple IoT devices; 4) heterogeneous resources mashup, or how to orchestrate resources on this heterogeneous environment. Concerning these items, the proposed work seems a valid starting point to overcome these challenges.. 2.2. Internet of Things. The concept of Internet of Things (IoT) was firstly introduced in 1999, by Kevin Ashton [21], and has referred to IoT as uniquely identifiable interoperable connected objects with radiofrequency identification technology. IoT was generally defined as a "dynamic global network infrastructure with self-configuring capabilities based on standards and interoperable communication protocols. In an IoT environment Physical and virtual ’things’ have identities and attributes and are capable of using intelligent interfaces and being integrated as an information network" [22]. Basically, the IoT can be treated as a superset of connecting devices that are uniquely identifiable by existing communication technologies. The words "Internet" and "Things" mean an interconnected world-wide network based on sensory, communication, networking, and information processing technologies, which might be the new version of information and communications technology. According to [4] the IoT introduction will affect users on both domestic and working fields. Some examples of this domestic influence includes applications scenarios in assisted living,.

(25) 23. e-health and enhanced learning where the new paradigm will play a leading role in the near future. On the working field, the most apparent consequences will be equally visible in scenarios such as, automation and industrial manufacturing, logistics, business/process management, intelligent transportation of people and goods. The U.S. National Intelligence Council (NIC) included IoT in the list of six "Disruptive Civil Technologies" with potential impacts on the U.S. national power [23]. NIC also foresees that "by 2025 Internet nodes may reside in everyday things – food packages, furniture, paper documents, and more". It highlights future opportunities that will arise, starting from the idea that "popular demand combined with technology advances could drive a widespread diffusion of an Internet of Things that could, like the present Internet, contribute invaluably to economic development". The International Telecommunication Union (ITU) also discussed the enabling technologies, potential markets, and emerging challenges and the implications of the IoT [24]. The IoT describes the next generation of Internet, where the physical things could be accessed and identified through the Internet. Depending on various technologies for the implementation, the definition of IoT definition varies. However, the fundamental of IoT implies that objects composing an IoT environment can be identified uniquely in the virtual representations. Within an IoT, all things are able to exchange data and, if needed, process data according to predefined schemes. Recent research has spawned the concept of IoT [25] [26] that connects billions of things across the globe to the Internet and enables Machine to Machine (M2M) communication [27] among these devices. Contemporary devices or Internet based systems are gradually converging towards IoT [28]. According to [11], by 2020, around 80 billion devices will be connected to the Internet. Thus, by 2020, it is estimated that a large number of applications will be required to be processed and served through the technology of IoT [29] [30]. Analyzing contemporary data trends of large volume, heavy heterogeneity, and high velocity (Big data), it is also anticipated that a vast majority of these applications are highly latencysensitive and require real-time processing [31] [32] [33]. Therefore, to provision the resource management and heavy computational potential to the applications, IoT leans highly on cloud computing [34] [35]. Consequently, the performance of IoT is profoundly dependent on the ability of cloud platforms to serve billions of devices and their applications, in real-time [36]..

(26) 24. 2.3. Cloud Computing. According to [37] [38] Cloud computing is neither a completely new concept nor a new technology. It is just a new business operational model originated from other existing technologies such as Virtualization, Service Oriented Architecture (SOA) and Web2. Already, there are several definitions of Cloud computing in the academic and commercial world [39] [40] [7] [41], these definitions can be summed up into this one: “Cloud Computing is a parallel and distributed system consisting of a shared pool of virtualized resources (e.g. network, server, storage, application, and service) in large-scale data centers. These resources can be dynamically provisioned, reconfigured and exploited by a payper-use economic model in which consumer is charged on the quantity of cloud services usage and the provider guarantees Service Level Agreements (SLAs) through negotiations with consumers. In addition, resources can be rapidly leased and released with minimal management effort or service provider interaction. Hardware management is highly abstracted from the user and infrastructure capacity is highly elastic.” The aim is to concentrate computation and storage in data centers, where high-performance machines are linked by high-bandwidth connections, and all of these resources are carefully managed. The end-users make the requests that initiate computations and receive the results. In spite of the existing differences among definitions of Cloud computing, some common characteristics are described as follow. • Virtualization: Hardware Virtualization mediates access to the physical resources, decouples applications from the underlying hardware and creates a virtualized hardware platform using a software layer (hypervisor). The hypervisor creates and runs virtual machines (VM). A virtual machine is like a real computer, except that it uses virtual resources, which enables isolation and independence from particular hardware. Also, permits the assignment of virtual resources to another physical hardware in case of capacity constraints or hardware failures. Through Virtualization, the underlying architecture is abstracted from the user while it still provides flexible and rapid access to it. • Multitenancy: Allows several customers (tenants) to share the DC infrastructure without being aware of it and without compromising the privacy and security of each customer’s data (through isolation). Even though multitenancy is cost-effective, it may affect performance when simultaneously accessing shared services (multi-tenant interference). • Service oriented architecture: Everything is expressed and exposed as a service which de-.

(27) 25. livers an integrated and orchestrated suite of functions to an end-user through composition of both loosely and tightly coupled functions. • On-demand self-service: Cloud computing allows self-service access so that customers can request, customize, pay, and use services, as needed, automatically, without requiring interaction with providers or any intervention of human operators [7]. • Elasticity: To provide the illusion of infinite resources, more virtual machines (on two or more physical machines) can be quickly provisioned (scale out), in the case of peak demand, and rapidly released (scale in), to keep up with the demand. These scaling methods can automatically be done according to the user’s predefined conditions (Auto Scaling). • Network access: Services are available over the network and accessed through standard mechanisms that can be achieved by heterogeneous thin or thick client platforms such as mobile phones and laptops. • Resource pooling: The Cloud provider offers a pool of computing resources to serve multiple consumers using a multi-tenant model, with different physical and virtual resources. Location transparency in the Cloud that hides resource’s physical location from the customer (may be provided in a higher level of abstraction, like country) provides more flexibility to Cloud providers for managing their own resource pool. • Measured service: Cloud systems can transparently monitor, control and measure service usages for both the provider and the consumer by leveraging a metering capability at some level of abstraction appropriate to the type of service, similar to what is being done for utilities such as Electricity, Gas, Water, Telecommunication, etc. In addition to the outstanding characteristics, Cloud computing brought cost saving for consumers through removing capital expenses (CapEx) as well as reducing operating expenses (OpEx). In CapEx, this saving is achieved by eliminating the total cost of the entire infrastructures. In OpEx, saving is achieved by sharing the cost of electricity, system admins, hardware engineers, network engineers, facilities management, fire protection, and insurance or local and state taxes on facilities. There are other hidden OpEx costs that a Cloud instance can eliminate, such as purchasing and acquisition overhead, asset insurance, and business interruption planning and software. At least three well established delivery models exist on the Cloud: Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS) deployed as public,.

(28) 26. private, community, and hybrid Clouds. These service types and boundaries are elaborated in the following sections.. 2.3.1. Service Models. Consumers purchase Cloud services in the form of infrastructure, platform, or software. Infrastructure services are considered to be the bottom layer of Cloud computing systems. Infrastructure as a Service offers virtualized resources (such as computation, storage, and communication) on-demand to the infrastructure specialists (IaaS consumers) who are able to deploy and run arbitrary operating systems and customized applications. IaaS Cloud providers often provide virtual machines with a software stack that allows to make them customized, similar to physical servers. They grant privileges to users for doing some operations on their virtual server (such as starting and stopping it). Therefore, an infrastructure specialist does not manage or control the underlying Cloud infrastructure while having control over operating systems, storage, deployed applications, and possibly limited control of some networking components, e.g. host firewalls. This type of service is particularly useful for start-ups, small and medium businesses with rapidly expanding or dynamic changes, that do not want to invest in infrastructure [40]. Platform as a Service is another model in the Cloud that offers a higher level of abstraction to make a Cloud easily programmable. PaaS Cloud providers provide a scalable platform with a software stack containing all tools and programming languages supported by the provider. They allow developers (PaaS consumers) to create and deploy applications without the hassle of managing infrastructure, and regardless of the concerns about processors and memory capacity. Therefore, the developer does not manage or control the underlying Cloud infrastructure including network, servers, operating systems or storage while having control over the deployed applications and possibly application-hosting environment configurations [7]. Delivering applications supplied by service providers at the highest level of abstraction in the Cloud to the end-users (SaaS consumers) through a thin client interface such as a web portal is known as Software as a Service. SaaS Cloud providers supply a software stack containing an operating system, middlewares such as database or web servers, and an instance of the Cloud application, all in a virtual machine. Therefore, the end-user does not manage or control the underlying Cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings. SaaS alleviates the burden of software maintenance for customers and simplifies development and testing for providers [42]. Figure 1 (adapted from [43]) summarize the service models and its responsibilities..

(29) 27. Figure 1: Cloud-computing service models and responsibilities.. 2.3.2. Deployment Models. There are four general Cloud deployment models known as private, community, public, and hybrid Cloud. In Private Cloud, the infrastructure is owned and exclusively used by a single organization, and managed by the organization or a third-party and may exist on or off the premises of the organization. Many organizations, particularly governmental or very large organizations, embraced this model to exploit the Cloud benefits like flexibility, reliability, cost reduction, sustainability, elasticity, and so on. In Community Cloud, the infrastructure is shared by several organizations and supports a specific community with shared concerns such as mission, security requirements, policy, and compliance considerations. It may be owned and managed by the organizations or by a thirdparty and may exist on premises or off-premises. In Public Cloud, the infrastructure exists on the premises of the Cloud provider and is available to the general public or a large industry group, and is owned by an organization selling Cloud services. There are many Public Cloud computing styles based on the underlying resource abstraction technologies, for example Amazon Web Services [44], Google Cloud Platform [45] and Rackspace Public Cloud [46]. However, a public Cloud can use the same hardware in-.

(30) 28. frastructure (with large scale) as a Private one. In contrast with Private, Public Cloud lacks fine-grained control over data, network and security settings, which hampers their effectiveness in many business scenarios [47]. This model is suitable for small and medium businesses to support their growing business without huge investment in the infrastructure. Sometimes the best infrastructure to fit an organization’s specific needs requires both Cloud and on-premises environments. In Hybrid Cloud, the services within the same organization are a composition of two or more Clouds (Private, Community, or Public) to address the limitations of each model with more scalability and flexibility whilst both saving money and providing additional security. On the down side, Hybrid Cloud environments involve complex management and governance challenges. Some other deployment models, such as Virtual Private Cloud and Managed Cloud are well known but not widely used. Virtual Private Cloud is a Private Cloud that leverages a Public Cloud infrastructure using advanced network capabilities (such as VPN) in an isolated and secured manner. Managed Cloud is a type of private Cloud that is managed by a team of experts in a third-party company. Managed Cloud includes access to a dedicated, 24 x 7 x 365 support team via phone, chat, online support, and so on to support Cloud servers from the OS up through the application stack. Amazon VPC [48] and Rackspace Managed Cloud [49] are examples of Virtual Private Cloud and Managed Cloud, respectively. There are also Managed Virtual Private Cloud such as HP Helion Managed Virtual Private Cloud [50]. Figure 2 (adapted from [51]) depicts the Cloud Computing deployment models.. 2.3.3. Data Center Bottleneck. Over the last few years, some researches [52] [53] [54] on cloud computing illustrate the detailed underlying process behind the provisioning of cloud services. Generally, complete process of virtualization of cloud services involves several cloud DCs, dispersed across multiple geographical locations. Cloud systems are, therefore, DCN-centric and for every user request, service provisioning involves one or more DCNs. In [55], Xiao et al. addressed the problem of design and optimal positioning of DCs to improve the QoS in terms of service latency and cost efficiency. However, the work is strongly affected by the efficiency of the DCNs. In another work, Chen et al. [56] focused on the problem of latency for video streaming services. The work suggests the usage of a single DC under a single Cloud Service Provider (CSP). However, the situation might be hypothetical as in real-life scenarios of IoT, a single DC under a single global CSP may hinder the overall service efficiency due to lack of proper management and shortage of cloud storage. Tziritas et al. [57] addressed process migration to improve the performance.

(31) 29. Figure 2: Cloud-computing deployment models representation. of cloud systems and demonstrated experimental results with 1000 processes. However, IoT concerns billions of processes and in such a scenario, process migration within DCs might be of overhead degrading the performance. Other scheduling techniques that focus on real-time workload scheduling [58] or energy-efficient scheduling [59] have also worked with low scale scenarios. For each of the above works, the DCs form the computing resources hub and the DCNs are invoked every time an application makes a service request. Therefore, with the increase in the number of IoT consumers and with every request being required to be processed within the DCs, it is likely that the cloud DCNs will encounter a serious difficulty in serving the IoT applications in real-time. Additionally, with the increase in the number of latency-sensitive applications, the efficiency of service provisioning will also reduce to a significant extent.. 2.4. Fog Computing. The contemporary trends in data volume, velocity, and variety and the limitations of cloud computing make it easy to speculate the need to propose new techniques of data management.

(32) 30. and administration. In this context, Cisco proposed the revolutionary concept of fog computing [12] [60]. Fog computing is defined as a distributed computing infrastructure that is able to handle billions of Internet-connected devices. It is a model in which data, processing and applications are concentrated in devices at the network edge, rather than existing almost entirely in the Cloud, to isolate them from the Cloud systems and place them closer to the end-user, which is the aim of fog computing. The Fog is organizationally located below the Cloud and serves as an optimized transfer medium for services and data within the Cloud. The Fog computing happens outside the Cloud and ensures that Cloud services, such as compute, storage, workloads, applications and big data, can be provided at any edge of the network (Internet) in a truly distributed way. By controlling data in various edge points, Fog computing integrates with core Cloud services for turning data center into a distributed Cloud platform for users. In other words, FOG brings computing from the core to the edge of the network. In this context, it may be just another name for Edge computing [61]. Edge Computing is pushing the frontier of computing applications, data, and services away from centralized nodes to the logical extremes of a network. It enables analytics and knowledge generation to run at the source of the data. This approach requires leveraging resources that may not be continuously connected to a network such as laptops, smart phones, tablets and sensors [62]. Two systems that can provide resources for computing near the edge of the network are the MediaBroker [63] for live sensor stream analysis, and Cloudlets [64] for interactive applications. However, neither currently supports widely distributed geospatial applications [65]. The idea of Fog computing has emerged to distribute all data and place it closer to the enduser, eliminate service latency, improve QoS and remove other possible obstacles connected with data transfer. Because of its wide geographical distribution, the Fog paradigm is well positioned for big data and real time analytics and it supports mobile computing and data streaming. Fog computing is not a replacement for Cloud computing. It is just an addition which develops the concept of Cloud services. Services are hosted at the network edge or even end devices such as set-top-boxes or access points, etc. Conceptual Fog computing builds upon existing and common technologies like Content Delivery Networks (CDN) [66], but based on Cloud technologies it should ensure the delivery of more complex services. However, developing applications using fog computing resources is more complex [65]. Already, the everything-as-a-service models have been applied by the industry. This means.

(33) 31. that the future of the computing paradigms must support the idea of the Internet of Things in order to successfully emerge, wherein sensors and actuators blend seamlessly with the environment around us and the information is shared across platforms in order to develop a common operating picture. Fog computing supports emerging Internet of Things applications that demand real-time or predictable latency such as industrial automation, transportation, sensor networks and actuators. The concept of Fog computing is not something to be developed in the future. It is already here and a number of distributed computing and storage start-ups are adopting the phrase [67]. A lot of companies have already introduced it, while other companies are ready for it [68]. Actually, any company which delivers content can start using Fog computing. A good example is Netflix, a provider of media content, who is able to reach its numerous globally distributed customers. With the data management in one or two central data centers, the delivery of videoon-demand service would not be efficient enough. Fog computing thus allows providing very large amounts of streamed data by delivering the data directly into the vicinity of the customer.. 2.4.1. Architecture. The Fog Computing architecture is highly based on a virtualized platform that provides compute, storage, and networking services between end devices and traditional Cloud Computing data centers, typically, but not exclusively located at the edge of network [69]. Figure 3 presents the architecture and illustrates an implementation of Fog Computing. At the bottom layer, million of connected devices (smart things, vehicles, wireless or wired machines) can take advantage of the network edge for processing and storage, it will also enable M2M communication between these devices in real-time. The next layer represents the network edge, where dozens of thousands of devices will compose the infrastructure for deploying the Fog nodes, these devices provides connectivity to the bottom layer through many technologies, such as 3G, 4G, LTE, WiFi. This layer will also provide the connectivity between the Cloud and the Fog nodes. Thousand of devices using different technologies (IP/MPLS, Multicast, QoS) will improve the communication between the Cloud and the network edge, at the same time it will provide security between the two entities. The top layer represents the Cloud, hosted at dedicated Data centers, it will manage the Fog nodes and host the applications, it will also store and process data from sources that do not require real-time information..

(34) 32. Figure 3: Fog-computing Architecture. 2.4.2. Applicability. Recent researches, have revealed some of the important aspects of fog computing. In [70], the authors considered various computing paradigms inclusive of cloud computing, and investigated the feasibility of building up a reliable and fault-tolerant fog computing platform. Do et al. [71] and Aazam and Huh [72] have inspected the different intricacies of resource allocation in a fog computing framework. Research into security for this paradigm has explored various theoretical vulnerabilities [73] [74]. However, most of the works on fog computing have primarily focused on the principles, basic notions, and the doctrines of it. Not many works have contributed in the technical aspect of the paradigm in terms of an implementation perspective. Cirani et al. [75] explored one such implementation of Fog computing, creating a Fog node named “IoT Hub”..

(35) 33. 2.5. Virtualization at the Network Edge. Internet of Things specific applications may require the deployment of gateways at the network edge to enable its interaction with physical sensors, pre-processing data from these sensors, and synchronizing it with the cloud. The orchestration, deployment, and maintenance of the software running on the gateways in large-scale deployments is known to be challenging. Due to the limited resource available in the network edge the evaluation of virtualization techniques is fundamental for better using this resources. Recent advances in the virtualization field improved the existent technology (hypervisorbased) and also created a new virtualization class, that is container-based virtualization, also referred as lightweight-virtualization. Contrary to VMs (created by hypervisor-based virtualization), containers can be seen as more flexible tools for packaging, delivering and orchestrating both software infrastructure services and applications, i.e., tasks that are typically a PaaS focus. Containers allow a more portable way aiming at more interoperability [76] while still utilizing operating systems (OS) visualization principles. VMs on the other hand are about hardware allocation and management (machines turned on/off and provisioned). With them, there is an IaaS (Infrastructure-as-a-Service) focus on hardware virtualization.. 2.5.1. Hypervisor-Based Virtualization. The core of the hypervisor based virtualization is a software technology called hypervisor, which allows several operating systems to run side-by-side on a given hardware. Unlike conventional virtual-computing programs, a hypervisor runs directly on the target hardware’s "bare metal", instead of as a program in another operating system. This allows both, guest OS and the hypervisor, to perform more efficiently. The current crop of hypervisors run on commodity hardware — the x86/x64 processor family, as opposed to specialized server hardware. Some of this is again due to the fact that processor architecture makes virtualization easier, but some of it is due to the way various hypervisor technologies were developed and improved in the open source domain making it much more simple for their technologies to be used all the more broadly. During the last decade, hypervisor-based virtualization has been widely used for implementing virtualization and isolation. As hypervisors operate at the hardware level, it enables supporting standalone virtual machines to be independent and isolated of the host system making possible to virtualize different guest OSes, e.g., Windows-based VMs on top of Linux. However, the trade-off here is that a full operating system is installed into a virtual machine,.

(36) 34. which means that the image will be substantially larger. In addition, the emulation of the virtual hardware device incurs more overhead. According to [77] hypervisors are classified in two different types: • Type-1: hypervisor that runs directly on the hardware and that hosts guest operating systems (operate on top of the host’s hardware). • Type-2: hypervisor that runs within a host OS and that hosts guest OSes inside of it (operate on top of the host’s operating system). However, this categorization is steadily being eroded by advances in hardware and operatingsystem technology. For example, Linux Kernel-based Virtual Machine (KVM) has characteristics of both types [78].. 2.5.2. Container-Based Virtualization. Containers are a lightweight approach to virtualization that can be used to rapidly develop, test, deploy, and update IoT applications at scale. Many web and mobile app developers make use of hypervisors, such as VirtualBox [79], to run virtual machines that virtualize physical hardware as part of a cross-platform development, testing, and deployment. Such workflow can be highly improved by the use of containers. Container-based virtualization (sometimes called lightweight-virtualization) is much more lightweight. Each container runs as an isolated user space instance on top of a shared host operating system kernel. Although the operating system is shared, individual containers have independent virtual network interfaces, independent process spaces, and separate file systems. These containers can be allocated with system resources like RAM, CPU and network by using control groups that implement resource isolation. Compared to hypervisor-based virtualization, where each VM runs its own operating system which only increases its use of system resources, containers use much less disk and memory resources. The container technology is not new, it has been built into Linux in the form of Linux Containers (LXC) [80] for almost ten years, and similar operating system level virtualization has also been offered by FreeBSD jails, AIX Workload Partitions and Solaris Containers. However Docker [81] has exploded onto the scene a couple of years ago, and has been causing excitement in IT circles ever since. The application container technology provided by Docker promises to change the way that IT operations are carried out just as virtualization technology did a few years previously..

(37) 35. 2.5.3. Virtual Machines vs. Containers. Containers and virtual machines both allow multiple applications to run on the same physical systems. They differ in the degree to which they meet different kinds of business and IT requirements. The main objective of virtual machines is to offer abstraction from the underlying hardware, and they do it very well. This enable costs reduction and automation when provisioning a complete software stack, including the operating system, the application, and its dependencies. By automating infrastructure as a service and platform as a service solutions, it is possible to reduce overall data center total cost of ownership (TCO). These savings come from server consolidation and simplified system administration, as different operating systems can run on the same hardware. However, there are cases where Virtual machines does not fit very well: Virtual machines need minutes to boot up, this give hackers time to exploit known vulnerabilities during boot up and can degrade the user experience. Also, as every virtualized application has at least two operating systems for operators to manage and secure, the hypervisor and the guest OS, patching and life-cycle management for virtual machines requires a significant effort. Even the simplest OS process needs its own virtual machine. This requirement increases flexibility, but it also makes virtual machines inefficient to use for micro-services architectures with hundreds or thousands of processes. When each physical server is replaced by one virtual machine, physical resource utilization tends to remain low. Businesses, academia, and government can gain from a more efficient way to build, ship, deploy, and run applications, and that’s where Linux containers come in. On this context, a Linux container is basically a set of processes and its dependencies are isolated from the rest of the machine. For example, if a web server relies on a particular python library, the container can encapsulate that library. In that way, containers makes possible to have multiple versions of the same library, or any other dependency, co-existing in the same environment, without the administrative overhead of a complete software stack, including the OS kernel [77]. As can be seen in Fig. 4 - (a) Virtual machines include the application, the necessary binaries and libraries, and an entire guest operating system – all of which can amount to tens of GBs. While containers (Fig 4 - (b)) include the application and all of its dependencies – but share the kernel with other containers, running as isolated processes in user space on the host operating system..

(38) 36. App 1. App 2. App 3. Bins/Libs. Bins/Libs. Bins/Libs. Guest OS. Guest OS. Guest OS. App 1. App 2. App 3. Bins/Libs. Bins/Libs. Bins/Libs. Hypervisor. Container Engine. Host Operating System. Host Operating System. Physical Server. Physical Server. (a). (b). Figure 4: (a) Virtual Machine and (b) Container isolation layers. Regarding the performance, containerized applications perform about as well as applications deployed on bare metal. Containers run in isolation, sharing an operating system instance, different from hypervisors which provide a logical abstraction at the hardware level. The container approach can also be used to improve application deployment in IoT. For example, it can lower costs, speed up the application development, simplify security, and also offer a easier way to adopt new IT models such as hybrid clouds and micro-services architecture. On a containerized environment there is fewer operating systems to manage, as each virtual machine can be carved into multiple containers, all sharing the same operating system kernel. Regarding the mobility, it is easier to move workload between private and public clouds, as using containers there is much fewer data to move. A virtualized application running ten virtual machines has eleven operating systems (the hypervisor and each guest operating system) and each needs patching. In contrast, a containerized server running ten different applications has only one operating system to maintain. For application patching, Docker container images are composed of layers, so it can easily patched by just adding a new layer. New layers does affect the others. For example, the layers of a web application image might constitute of three different layers Apache web server [82], a Python runtime system [83], and a MariaDB database [84].. 2.5.4. Resource Allocation in Virtual Machines and Containers. A key benefit of virtualization is the ability to consolidate multiple workloads onto a single computer system. This consolidation yields savings in power consumption, capital expense,.

(39) 37. and administration costs. The degree of savings depends on the ability to allocate hardware resources such as memory, CPU cycles, I/O, and network bandwidth [85]. There are many technologies for resource allocation, depending on the virtualization technology used. For example KVM (virtual machine) [86] and Docker (container) [81] make use of the Linux "Control Groups" (cgroups) [87] facility for applying resource management to their virtual machines and containers. Cgroups are organized hierarchically, like processes, and child cgroups inherit some of the attributes of their parents. However, there are differences between the two models. All processes on a Linux system are child processes of a common parent: the init process, which is executed by the kernel at boot time and starts other processes (which may in turn start child processes of their own). Because all processes descend from a single parent, the Linux process model is a single hierarchy, or tree. Additionally, every Linux process except init inherits the environment and certain other attributes of its parent process. Cgroups are similar to processes in that 1) they are hierarchical and 2) child cgroups inherit certain attributes from their parent cgroup. The fundamental difference is that many different hierarchies of cgroups can exist simultaneously on a system. If the Linux process model is a single tree of processes, then the cgroup model is one or more separate, unconnected trees of tasks (i.e. processes). Multiple separate hierarchies of cgroups are necessary because each hierarchy is attached to one or more controller. A controller represents a single resource, such as CPU time or memory. For KVM and Docker, resource allocation is applied on the start of the virtual machine or container by specifying the number of CPUs, memory, etc. For example, when starting a docker container, using the "-c" parameter it is possible to specify a relative weight, witch is some numeric value used to allocate relative CPU share. The resources are allocated dynamically, meaning that it will only be used when needed. It is also possible to resize (add/remove resources) an already running virtual machine or container.. 2.5.5. Docker. Containerization is the process of distributing and deploying applications in a portable and predictable way. It accomplishes this by packaging components and their dependencies into standardized, isolated, lightweight process environments called containers. Many organizations are now interested in designing applications and services that can be easily deployed to dis-.

(40) 38. tributed systems, this allows the system to scale easily and survive machine and application failures. Docker, a containerization platform developed to simplify and standardize deployment in various environments, was largely instrumental in spurring the adoption of this style of service design and management. A large amount of software has been created to build on this ecosystem of distributed container management. Docker came along in March, 2013, when the code, invented by Solomon Hykes, was released as open source. It is also the name of a company founded by Hykes that supports and develops Docker code. Both the Docker open source container and company’s approach have a lot of appeal, especially for cloud applications and agile development. Because many different Docker applications can run on top of a single OS instance, this can be a more efficient way to run applications. The company’s approach also speeds up applications development and testing, because software developers don’t have to worry about shipping special versions of the code for different operating systems. Because of the lightweight nature of its containers, the approach can also improve the portability of applications. Docker and containers are an efficient and fast way to move pieces of software around in the cloud. Nowadays, Docker is the most common containerization software in use. While other containerizing systems exist, Docker makes container creation and management simple and integrates with many open source projects. The Docker’s main advantages are: • Lightweight resource utilization: instead of virtualizing an entire operating system, containers isolate at the process level and use the host’s kernel. • Portability: all of the dependencies for a containerized application are bundled inside of the container, allowing it to run on any Docker host. • Predictability: The host does not care about what is running inside of the container and the container does not care about which host it is running on. The interfaces are standardized and the interactions are predictable. 2.5.5.1. Clustering. Docker Swarm is native clustering for Docker. It turns a pool of Docker hosts into a single, virtual Docker host. Because Docker Swarm serves the standard Docker API, any tool that already communicates with a Docker daemon can use Swarm to transparently scale to multiple hosts..