TOWARDS IMPROVEMENTS IN RESOURCE MANAGEMENT FOR
CONTENT DELIVERY NETWORKS
Ph.D. Thesis
Federal University of Pernambuco [email protected] www.cin.ufpe.br/~posgraduacao
RECIFE 2016
TOWARDS IMPROVEMENTS IN RESOURCE MANAGEMENT FOR
CONTENT DELIVERY NETWORKS
A Ph.D. Thesis presented to the Center for Informatics of Federal University of Pernambuco in partial fulfillment of the requirements for the degree of Philosophy Doctor in Computer Science.
Advisor: Djamel Fawzi Hadj Sadok
RECIFE 2016
Catalogação na fonte
Bibliotecária Monick Raquel Silvestre da S. Portes, CRB4-1217
R696t Rodrigues, Moisés Bezerra Estrela
Towards improvements in resource management for content delivery networks / Moisés Bezerra Estrela Rodrigues. – 2016.
116 f.: il., fig., tab.
Orientador: Djamel Fawzi Hadj Sadok.
Tese (Doutorado) – Universidade Federal de Pernambuco. CIn, Ciência da Computação, Recife, 2016.
Inclui referências e apêndices.
1. Redes de computadores. 2. Sistemas distribuídos. 3. Telefonia celular. 4. Vídeo - distribuição. I. Sadok, Djamel Fawzi Hadj (orientador). II. Título.
004.6 CDD (23. ed.) UFPE- MEI 2016-092
Towards Improvements in Resource Management for Content Delivery Networks Tese de doutorado apresentada ao Programa de Pós-Graduação em Ciência da Com-putação da Universidade Federal de Pernam-buco, como requisito parcial para a obtenção do título de Doutor em Ciência da Com-putação
Aprovado em: 03/03/2016.
———————————————————————– Orientador: Prof. Djamel Fawzi Hadj Sadok
BANCA EXAMINADORA
———————————————————————– Prof. Eduardo James Pereira Souto
Instituto de Computação/UFAM
———————————————————————– Prof. Arthur de Castro Callado
Departamento de Ciência da Computação/UFC
———————————————————————– Dr. Artur Ziviani
Laboratório Nacional de Computação Científica/LNCC
———————————————————————– Prof. Paulo Romero Martins Maciel
Centro de Informática(CIn)/UFPE
———————————————————————– Prof. Ricardo Massa Ferreira Lima
brothers Matheus and Naum, my sister Raquel and my beloved wife, Romanan.
Acknowledgements
First and foremost, thanks to my best friend, my wife, and my love, Romanan, for her patience and support during this work.
I would like to thank Professor Djamel and professor Judith for all the work, ideas, and support in supervising this thesis. Also special thanks to them for believing and allowing me to work with them in the GPRT (Networks and Telecommunications Research Group). Thanks also to André, Glauco, Ernani, Marcos, Daniel, and Demis for all the suggestions and talks that cleared so many doubts. Special thanks for Patricia and Wesley, your comments and recommendations were essential in the process of finalizing this work. Many thanks to all the other people from GPRT, who also offered support for this work. Thanks to the support staff, Rodrigo, Andreas, Bruno, Manu, Ana, Roxana, without your help and hard work this work surely would not be possible.
I am grateful to György Dán for the time he spent advising my research and for receiving me as a visiting researcher at the Lab of Communication Networks (LCN). Thanks also to my LCN colleagues, especially, Valentino, Sladjana, and Emil, for all the support and fantastic foosball matches.
Thanks for all my great friends from VaiDiBolo group. 20 years of friendship and counting. To my friends Thiago Aguiar, Rafael and Gabriel Malta, Felipe, Juju, and Pedro, for all the gigs and beach volleyball matches. Also, to Thiago Araujo for all the discussions about nothing and everything.
Finally, I would like to thank my parents Nonato and Nevinha, my brothers Naum and Matheus, my sister Raquel, my grandparents João, Antonieta (in memorian), Bezerra (in memorian) and Francisquinha (in memorian), my uncles, my aunts and my cousins for all faith in me and support throughout my life. They made me who I am, and I will always be grateful for that.
let the roots reach far and wide, and let it grow tall let the rings remain intact on the inside and though the autumn brings a fall of leaves let it grow tall —PROTEST THE HERO
Resumo
Durante a última década, a rede mundial de computadores evoluiu de um meio de conexão para um pequeno grupo de nós para o meio de pelo qual pessoas obtém conhecimento, interação social e entretenimento. Além disso, nossas casas e estações de trabalho não são nossos únicos pontos de acesso à rede. De acordo com a Cisco, o tráfego global da rede em 2018 será três vezes maior do que era em 2013. Entretenimento em tempo real tem sido e continuará sendo uma parte importante nesse crescimento. No entanto, a rede não foi projetada para lidar com essa demanda, portanto, existe a necessidade de novas tecnologias para superar tais desafios.
Content Delivery Networks (CDN) se mostram como uma boa alternativa para superar esses desafios. Seu conceito básico é distribuir servidores de réplica geograficamente, mantendo assim o conteúdo próximo aos usuários. Seguindo sua popularidade, um número crescente de CDNs, em sua maioria locais, começaram a ser implementadas. Além disso, computação em nuvem surgiu, tornando software e hardware recursos acessíveis através de interfaces bem definidas. Os serviços na nuvem, tais como Infrastructure as a Service (IaaS) distribuídos, tornam possível a implementação de CDNs complexas. Apesar de ser a melhor tecnologia para entrega de conteúdo em termos de escalabilidade, existem cenários que ainda desafiam as CDNs, como eventos de flash crowd. Portanto, precisamos estudar estratégias de entrega de conteúdo para acompanhar de maneira eficiente o constante crescimento na necessidade por conteúdo, aproveitando também as novas possibilidade como, o crescimento de CDNs localizadas e popularização da computação em nuvem.
Examinando os problemas levantados, essa tese apresenta estratégias no sentido de melhorar Con-tent Delivery Networks (CDN). Fazemos isso propondo e avaliando algoritmos, modelos e um protótipo demonstrando possíveis usos de tais tecnologias para melhorar o gerenciamento de recursos das CDNs. Apresentamos o P2PCDNSim, um simulador de CDNs planejado para auxiliar pesquisadores no processo de planejamento e avaliação de novas estratégias. Além disso, propomos uma nova estratégia de posicionamento de réplicas dinâmica, baseada na contagem de fluxos de dados passando pelos nós, que mantém uma Quality of Experience (QoE) similar enquanto diminui tráfego entre Autonomous System (AS). Ademais, propomos uma solução baseada em Software Defined Networks (SDN) que aumenta a flexibilidade de posicionamento de servidores réplica dentro do backhaul móvel. Nossos resultados experimentais mostram que o atraso introduzido pelo nosso módulo é menor que 5ms em 99% dos pacotes transmitidos, atraso mínimo nas redes Long-Term Evolution (LTE) atuais.
Palavras-chave: Redes de Distribuição de Conteúdo. Redes Definidas por Software. Com-putação em Nuvem. P2P. Simulação de Eventos Discretos. Protocolo de Tunelamento GPRS
Abstract
During the last decades, the world web went from a way to connect a handful of nodes to the means with which people cooperate in search of knowledge, social interaction, and entertainment. Furthermore, our homes and workstations are not the only places where we are connected, the mobile broadband market is present and changing the way we interact with the web. According to Cisco, global network traffic will be three times higher in 2018 than it was in 2013. Real-time entertainment has been and will remain an important part of this growth. However, the internet was not designed to handle such demand and, therefore, there is a need for new technologies to overcome those challenges.
Content Delivery Networks (CDN) prove to be an alternative to overcome those challenges. The basic concept is to distribute replica servers scattered geographically, keeping content close to end users. Following CDN’s popularity an increasing number of CDNs, most of them extremely localized, began to be deployed. Furthermore, Cloud Computing emerged, making software and hardware accessible as resources through well-defined interfaces. Using Cloud services, such as distributed IaaS, one could deploy complex CDNs. Despite being the best technology to scale content distribution, there are some scenarios where CDNs may perform poorly, such as flash crowd events. Therefore, we need to study content delivery techniques to efficiently accompany the ever increasing need for content contemplating new possibilities, such as growing the number of smaller localized CDNs and Cloud Computing.
Examining given issues this work presents strategies towards improvements in Content Delivery Networks (CDN). We do so by proposing and evaluating algorithms, models and a prototype demonstrating possible uses of such new technologies to improve CDN’s resource management. We present P2PCDNSim, a comprehensive CDN simulator designed to assist researchers in the process of planning and evaluating new strategies. Furthermore, we propose a new dy-namic Replica Placement Algorithm (RPA), based on the count of data flows through network nodes, that maintains similar Quality of Experience (QoE) while decreasing cross traffic during flash crowd events. Also, we propose a solution to improve the mobile backhaul’s replica placement flexibility based on SDN. Our experimental results show that the delay introduced by the developed module is less than 5ms for 99% of the packets, which is negligible in today’s LTE networks, and the slight negative impact on streaming rate selection is easily outweighed by the increased flexibility.
Keywords: Content Delivery Networks. Software Defined Networks. Cloud Computing.
List of Figures
1.1 Peak period aggregate traffic composition. . . 17
1.2 The growth of the mobile market. . . 20
2.1 500 nodes topology used in the evaluation of the FlowCount strategy. . . 29
2.2 Bandwidth timeline during the Youtube scenario simulation. . . 31
2.3 Cross traffic timeline during Youtube scenario simulation. . . 31
2.4 Startup delay box plot for YouTube scenario considering all clients. . . 32
2.5 Startup delay box plot for YouTube scenario excluding outliers. . . 33
2.6 Bandwidth timeline during the Youtube scenario simulation. . . 33
2.7 Cross traffic timeline during Youtube scenario simulation. . . 34
3.1 Basic model components. . . 40
3.2 Comparison of results obtained by our model with results from Molina, Palau, and Esteve (2004), considering the same scenario (Table 3.2). . . 47
3.3 Comparison of response time considering scenarios with and without multi-ple CDN collaboration. . . 47
3.4 Relation between kl , kf, which represent local and foreign CDN capacities respectively, and response time. . . 49
3.5 Relation between kf, mcp and response time. . . 49
3.6 Relation between kl, mcp and response time. . . 50
3.7 Comparison of response time considering two redirection strategies, namely Recursive and Interactive Request Routing. . . 53
3.8 Comparison of response time for cache hit values [0,1] considering Recursive Request Routing, Interactive Request Routing and no collaboration. . . 54
4.1 LTE Mobile backhaul architecture. . . 59
4.2 Software Defined Networks (SDN) control plane illustration. . . 60
4.3 Transparent caching for LTE networks architecture. . . 62
4.4 Experimental testbed topology. . . 65
4.5 Flow diagram illustrating TC’s operation to transparently redirect and splice content between DC and CSVR. . . 67
4.6 Box plot of bitrates for 15s length video segments. . . 67
4.7 CDF of bitrate for 1s-15s segments for GTP+TC scenario. . . 68
4.8 Segment bitrate selection frequency for three scenarios and n = 4, 32, 64, 128 simultaneous DASH clients. Bitrates in the legend in kbps. . . 69
and n = 4, 32, 64, 128 simultaneous DASH clients. . . 70 A.1 Illustration of the basic CDN concept, store content close to end users. . . 86 A.2 CDN components and how they interact based on an illustration found in (BUYYA;
PATHAN; VAKALI, 2008). . . 88 A.3 Request rate variation over time during a flash crowd event. . . 91 A.4 Screen shot of P2PCDNSim’s GUI. This realistic globe shaped interface enables
real-time on-the-fly metric monitoring. . . 94 A.5 The modularized architecture of the P2PCDNSim simulator. . . 95 A.6 Example of a possible hybrid CDN-P2P topology where a set of nodes compose
an AS. . . 96 A.7 Example of a simple topology where each node is considered a single AS. . . . 96 A.8 Screenshot from the P2PCDNSim simulation scenario wizard. . . 99 A.9 TCP congestion window behavior comparing ns-3 and P2PCDNSim. . . 101 A.10 Memory usage comparison between ns-3 and P2PCDNSim considering the three
nodes scenario. . . 102 A.11 Memory usage comparison between ns-3 and P2PCDNSim considering the 20
nodes scenario. . . 102 C.1 Cache hit count collected from all replica servers during the simulation. . . 113 C.2 Cache miss count collected from all replica servers during the simulation. . . . 113 C.3 Cross AS traffic rate, in bits/second, during the simulation. . . 114 C.4 Inner AS traffic rate, in bits/second, during the simulation. . . 114 C.5 Total Network traffic rate, in bits/second, during the simulation. . . 115 C.6 Startup Delay mean collected from all clients that requested content during the
experiment. . . 115 C.7 Startup Delay through simulation time collected from all clients that requested
List of Tables
2.1 Description of the 24 hours scenario generated used ProwGen (BUSARI; WILLIAMSON,
2002) . . . 29
2.2 95th percentile in MB/s for Youtube scenario considering Total Network Traffic and Cross AS Traffic metrics. . . 32
2.3 95th percentile for ProwGen scenario considering Total Network Traffic and Cross AS Traffic metrics. . . 34
3.1 Description of variables used in our proposed model. . . 43
3.2 Second scenario describing a small network with 8 client clusters and 4 Replica Serverss (RSs). . . 46
3.3 Description of the scenario used to evaluate if it is worth to collaborate. . . 46
5.1 Scientific papers produced related to this Thesis. . . 73
5.2 Other publications . . . 74
A.1 Results collected from both simulators for the three nodes topology. . . 103
A.2 Results collected from both simulators for the 20 nodes topology. . . 103
A.3 Scenario configuration for the comparison between P2PCDNSim and CDNSim. 103 A.4 Cache hit results collected for scenarios 1 and 2 using P2PCDNSim and CDNSim.103 A.5 List of all contributions related to the P2PCDNSim simulator tool. . . 106
List of Acronyms
3GPP 3rd Generation Partnership Project . . . 19
AS Autonomous System . . . 28
AIS Accounting Internetworking System . . . 38
ATM Asynchronous Transfer Mode . . . 58
CAPEX Capital Expenditure . . . 70
CDN Content Delivery Networks . . . 17
CP Content Providers . . . 86
CPM Constraint P-Median . . . 25
CDNI Content Delivery Networks Interconnection . . . 38
CSN Content Service Networks . . . 38
CDI Content Distribution Internetworking . . . 37
CSDN Content and Service Delivery Networks . . . 38
CSD Content Server Directory . . . 61
CSP Content Service Provider . . . 104
CLD Content Location Directory . . . 61
CCN Content Centric Networks . . . 71
DNS Domain Name Service . . . 37
DIS Distribution Internetworking System . . . 38
DASH Dynamic Adaptive Streaming over HTTP . . . 60
DPI Deep Packet Inspection . . . 59
EPS Evolved Packet System . . . 19
GTP GPRS Tunnelling Protocol . . . 19
GPRT Network and Telecommunications Research Group . . . 92
ISP Internet Service Provider . . . 18
IaaS Infrastructure as a Service . . . 24
IETF Internet Engineering Task Force . . . 37
IDNS Intelligent Domain Name Server . . . 37
LTE Long-Term Evolution . . . 58
LRU Least Recently Used . . . 97
MME Mobility Management Entity . . . 58
MPD Media Presentation Description . . . 60
ns-3 Network Simulator 3 . . . 99
NAT Network Address Translation . . . 60
NRS Name Routing System . . . 71
P2P Peer to Peer . . . 25
PGW Packet Data Network Gateway . . . 19
PDH Plesiosynchronous Digital Hierarchy . . . 58
OS Origin Server . . . 39
OPEX Operating Expense . . . 70
QoE Quality of Experience . . . 21
RAN Radio Access Network . . . 70
RPA Replica Placement Algorithm . . . 18
RS Replica Servers . . . 18
RR Request Redirector . . . 18
RRIS Request-routing Internetworking System . . . 38
RIEP Request-Routing Information Exchange Protocol . . . 38
RTT Round Trip Time . . . 40
SLA Service Level Agreements . . . 19
SDN Software Defined Networks . . . 19
SGW Serving Gateways . . . 58
TCP Trasmission Control Protocol . . . 61
TEID Tunnel Endpoint Identifier . . . 66
UGC User Generated Content . . . 86
UE User Equipment . . . 58
Contents
1 Introduction 16
1.1 Motivations . . . 17
1.2 Objectives . . . 21
1.3 Thesis Outline . . . 22
2 FlowCount: Dynamic Replica Placement Algorithm 23 2.1 Introduction . . . 23
2.2 RPA State of the Art . . . 25
2.3 FlowCount Algorithm Description . . . 26
2.3.1 FlowCount Strategy . . . 26
2.3.2 Complexity Analysis . . . 27
2.4 Experiments and Results . . . 28
2.4.1 Metrics . . . 28
2.4.2 Simulation Results . . . 30
2.5 Conclusion . . . 34
3 Multiple CDNs Collaboration 36 3.1 Introduction . . . 36
3.2 State of the Art . . . 37
3.2.1 CDN Collaboration . . . 37
3.2.2 CDN Models . . . 38
3.3 CDN Collaboration Model . . . 39
3.3.1 Model Discussion . . . 44
3.4 Methodology and Initial Experimental Results . . . 45
3.5 Collaboration Overhead . . . 50
3.5.1 Collaboration Overhead Results . . . 53
3.6 Lessons Learned . . . 54
3.7 Conclusion . . . 55
4 Enabling Transparent Caching in LTE Mobile Backhaul 56 4.1 Introduction . . . 56
4.2 Background . . . 58
4.2.1 Mobile Backhaul Architecture . . . 58
4.2.2 Stateful L4-L7 Processing in SDN . . . 59
4.2.3 MPEG DASH Streaming . . . 60
4.3.1 Design and Function Placement . . . 62
4.3.2 Switch-based transparent caching for LTE . . . 63
4.4 Prototyping and Experimental evaluation . . . 64
4.4.1 Prototype Implementation . . . 64
4.4.2 Experiment Methodology . . . 66
4.4.3 Throughput Performance . . . 66
4.4.4 DASH Streaming Performance . . . 68
4.5 Related work . . . 70
4.6 Conclusion . . . 71
5 Conclusion 72 5.1 Contributions of this Thesis . . . 72
5.2 Future Works . . . 74
References 76 Appendix 83 A P2PCDNSim Simulation Tool 85 A.1 Content Delivery Networks . . . 85
A.1.1 CDN Architecture . . . 87
A.1.2 The Flash Crowd Challenge . . . 90
A.2 CDN Simulation State of the Art . . . 91
A.3 P2PCDNSim Simulation Tool . . . 92
A.3.1 Architecture . . . 93
A.3.2 Network Layer Comparison and CDN Layer Validation . . . 99
A.4 Lessons Learned . . . 103
A.5 Observational Study . . . 104
A.6 Concluding Remarks . . . 105
B P2PCDNSim I/O 107
1
Introduction
If you would cause your view . . . to be acknowledged by scientific men; you would do a great service to science. If you would even get them to say yes or no to your conclusions it would help to clear the future progress. I believe some hesitate because they do not like their thoughts disturbed.
—MICHAEL FARADAY
The world-wide web started as a way to connect a handful of nodes and it is now the means through which people cooperate in search of knowledge, social interaction and entertainment. Not only broadband access is evermore present but the “Next-Generation Access”, connections above 30Mbps, reached almost complete urban coverage in the US. According to the North American Federal Communications Commission, more than 80% households have high broadband connections (CHIEF, 2015). Furthermore, our homes and workstations are not the only places where we are connected, the mobile broadband market is changing the way we interact with the world wide web. Shipments of mobile devices are increasing worldwide each year, according to the International Data Corporation (IDC), these are expected to grow at a compound annual growth rate (CAGR) of 12.7% from 2013 to 2018 (IDC, 2014). Therefore, mobile network traffic is also expected to follow suite. According to Cisco, the expected CAGR to the mobile market is 54% from 2014 to 2019 (CISCO, 2015). The numbers only support something that is clear in our everyday activities, we were never as connected as we are today.
The presence of more devices leads to more people connected, which, in turn, increases in network traffic. According to Cisco’s network forecast, global network traffic will continue to increase (FORECAST, 2014) resulting in a total network traffic almost three times higher in 2018 than in 2013. . Mobile data traffic is also increasing, according to the same source, it will grow at a CAGR of 57% from 2014 to 2019. One of the main contributors to this growth is multimedia streaming. According to the Global Internet Phenomena report (ULC, 2014), considering North America, real-time entertainment traffic is responsible for 59.09% of the aggregated traffic for fixed access and 36.07% for mobile traffic, as we can see in Figure 1.1(a) and Figure 1.1(b)
respectively. Both industry and academia agree that real-time entertainment has been and will remain an important subject, as confirmed by Kurt Michel, the director of product marketing at Akamai, “live video streaming has become an increasingly important part of the web content universe, as a variety of businesses and organizations attempt to capture a ’share of eyeball’ and deliver richer, more HDTV-like experiences” (MICHEL, 2013). Several studies argue that live streaming is increasingly popular (ZHUANG; GUO, 2011; FORECAST, 2014) and also that HD streaming is an established standard for all viewing experiences, with the Super HD technology becoming the next big thing in video content delivery. However, the internet was not designed to handle such demand and, therefore, new technologies are proposed to overcome those challenges. We need to study content delivery techniques to plan and efficiently accompany the ever increasing need for content.
Figure 1.1: Peak period aggregate traffic composition.
(a) Traffic composition for fixed access. (b) Traffic composition for mobile access.
Source: Sandvine’s global Internet phenomena report (ULC, 2014).
1.1
Motivations
Examining content delivery scalability, the first technology that comes to mind is Content Delivery Networks (CDN). CDNs were responsible for around 37% of the global traffic in 2013 according to Cisco’s forecast. Furthermore, according to the same forecast they will be responsible for as much as 55% of all internet traffic in 2018. Considering only video traffic the numbers are even higher, from 53% in 2013 to 67% in 2018. It is expected that every major content provider is either using a commercial CDN to deliver content or deploying his own. Netflix reportedly started deploying its own CDN, the OpenConnect, to handle the impressive amount of video content they deliver. Considering North America alone, Netflix is responsible for more than a third (34.2%) of all network traffic (ULC, 2014). CDN related numbers are
always impressive as the technology became a critical part of the current internet infrastructure, for instance, Akamai claims to handle between 15% and 30% of all internet traffic (AKAMAI. FACTS & FIGURES., 2013).
Content Delivery Networks (CDN) prove to be the alternative to overcome challenges imposed by the increasing traffic demands and the best effort nature of the network. The basic concept behind a CDN design is to keep content close to end users. This is done by strategically placing several servers near end users. Those servers are called Surrogates, Caches or Replica Servers (RS). They interact with a special server, called the Origin server, to obtain the most popular content according to users’ location. Finally, we need a coordinator to complete the basic infrastructure of a CDN, which assures that clients are redirected to the most suitable, frequently the closest, replica server. The name of this entity is Request Redirector (RR).
Despite being based on a simple concept, to bring content closer to end users, CDNs are complex systems with several decisions to enable content delivery, such as replica server management. The RS management problem includes RS placement and scaling. RS placement’s main question would be where to place replicas? The fundamental principle is to place them as close to end-users as possible. However, there is a finite number of replica servers. Therefore, one should find a way to evaluate the best placement according to one’s scenario. Replica Placement Algorithm (RPA) are algorithms designed to evaluate scenarios and propose the placement of available replicas. On the other hand, RS scaling problem relates to the necessity and outcomes of scaling overall caching capacity of the system. Scaling the caching capacity could mean increasing the number of replica servers, increasing their capacity or establishment of a cooperation agreement between peering CDNs.
CDNs are de facto accepted as the primary content delivery strategy. The result of CDN’s popularity is an increasing number of CDNs deployed with several purposes (NIVEN-JENKINS; LE FAUCHEUR; BITAR, 2012), varying regarding content, coverage, and capacity. Among them, a very limited number aims to distribute content on a worldwide scale, such as Akamai and Limelight. Most of them have a restricted coverage, being extremely localized, for instance, within an Internet Service Provider (ISP) (BERTRAND et al., 2012; FRANK et al., 2013; SHARMA; VENKATARAMANI; SITARAMAN, 2013). Scaling each CDN, in terms of coverage and capability, would be very expensive. A possible way to deal with this RS scaling problem would be through collaboration between restricted CDNs. This way CDNs’ coverage could expand temporally to handle a set of requests greater than the local CDN capacity (NIVEN-JENKINS; LE FAUCHEUR; BITAR, 2012; JESUS; AGUIAR, 2012). CDNs could also negotiate collaborations to fulfill service level agreements established between the CDN and content providers. However, effective collaboration raises several challenges (PATHAN et al., 2007a; NIVEN-JENKINS; LE FAUCHEUR; BITAR, 2012). Furthermore, considering the diversity of existing CDNs, determining their essential features to decide which third party CDN is most suitable to tackle a problem offers a significant challenge.
to quantify the possible gain obtained through collaboration, as well as to gather more information about the several variables involved in the process.
One important thing to note about CDNs is that this technology is part of the Internet since at least the beginning of this century. The concept is not new but new possibilities are available. Possibilities driven by an ever-growing number of smaller localized CDNs, along with new technologies, such as Cloud Computing and Software Defined Networks (SDN).
Cloud Computing emerged as a buzzword in the end of the last decade and it is part of the trending topics ever since. The cloud can be seen as a conceptual layer on the Internet, which turns all available software and hardware resources transparent, making them accessible through a well-defined interface. Notions like on-demand self-service, broad network access, resource pooling (DILLON; WU; CHANG, 2010) and other trademarks of Cloud Computing services were a key point to its current popularity. Also, many Cloud Storage services emerged in the last years providing data storage in several continents and backed by rigorous Service Level Agreements (SLA) (CATHERINE; EDWIN, 2013). Resources made available through provider-specific Web Service APIs. This way, through Cloud services, one could deploy complex CDNs using the IaaS distributed infrastructure, or alternatively a simpler CDN using Cloud Storage services.
We believe that both technologies devise new possibilities for CDNs, resulting in more flexible resource management. For instance, a basic step to deploy a CDN is to place replica servers, using a RPA. Usually, replica servers are statically placed, meaning that they will not be moved during the operation of the CDN. However, the flexibility provided by Cloud Com-puting services enables on-demand deployment and reallocation of replica servers to different geographic regions. Such flexibility opens the space for new strategies, for instance, new RPAs, that thrive from those new technologies.
As the number of mobile users grow the role of mobile carriers in content delivery increases. Figure 1.2 illustrates the growth in mobile devices and mobile access to multimedia content. The extension indicates that consumers are expanding their number of preferred platforms, imposing on mobile carriers the challenge of coping with this growth while meeting their expectations.
The current mobile network architecture is an IP packet switched network built around 3rd Generation Partnership Project (3GPP) specifications that define the Evolved Packet System (EPS) architecture, the foundation of fourth-generation mobile networks. In the EPS, traffic is encapsulated in GPRS Tunnelling Protocol (GTP) tunnels transporting packets from edge nodes, eNodeB, to mobile network gateways, such as the Packet Data Network Gateway (PGW), where packets are forwarded towards the global Internet. Although this architecture has been able to drive the ongoing mobile revolution, it lacks the flexibility needed to control the constantly increasing amount of mobile traffic, dynamically. Indeed, GTP tunnels impose that packets traverse the whole mobile infrastructure, and transform the mobile backhaul into a passive network segment in which traffic cannot be dynamically managed. Forced by the limited
Figure 1.2: The growth of the mobile market.
Source: Conviva’s viewer experience report from (CONVIVA, 2015).
flexibility of the current infrastructure, to meet bandwidth and latency requirements despite the increasing amount of traffic in their networks, mobile network operators have been constantly increasing their network capacity. Despite being effective, increasing the network capacity significantly increases mobile network operators’ costs as it requires the deployment of additional infrastructure. As a result, mobile edge solutions have more recently become available, e.g., LTE caches by ARA Networks1, and DatE by I-Direct2. However, limited to the network edge, and since they do not allow to bypass GTP tunnels, edge solutions miss the potential benefits of in-network caching and dynamic traffic management.
Summarizing, this Thesis investigates the following research questions:
How to develop on new technologies, such as Cloud Computing, to propose new and improved RPAs? We believe that new technologies devise new possibilities for CDNs, resulting in possibly a more flexible resource management. For instance, a primary step to deploy a CDN is to place replica servers, using a Replica Placement Algorithm (RPA). Usually, replica servers are statically placed, meaning that they will not be moved during the operation of the CDN. However, the flexibility provided by Cloud Computing services enables on-demand deployment and reallocation of replica servers to different geographic regions. A new RPA that thrives from those technologies is proposed and investigated in this thesis, comparing its performance with well-known RPAs.
1http://www.aranetworks.com/solutions/mobile_edgeCDN
How to leverage from CDNs popularity to solve momentaneous RS scaling mands? The result of CDNs’ popularity is an increasing number of CDNs de-ployed with several purposes (NIVEN-JENKINS; LE FAUCHEUR; BITAR, 2012), varying regarding content, coverage, and capacity. Through collaboration between restricted CDNs, coverage could expand temporally to handle an unusual set of requests. There is, therefore, the need for a way to analyze collaboration scenarios among CDNs to quantify the possible gain obtained through collaboration. This the-sis presents an analytical model for collaboration among CDNs, considering among other variables, client dispersion through the network, different CDN capacities, and cache misses.
How to increase mobile backhaul flexibility to enhance replica placement pos-sibilities? The current mobile network architecture lacks the flexibility needed to control the constantly increasing amount of mobile traffic, due to traffic being en-capsulated in GPRS Tunnelling Protocol (GTP) tunnels transporting packets from edge nodes, to mobile network gateways that forward packets towards the global Internet. Despite effective, increasing the network capacity significantly increases mobile network operators’ costs. Edge caching solutions have become available, however, since they do not allow to bypass GTP tunnels, they miss the potential benefits of in-network caching and dynamic traffic management. This thesis presents a solution to enhance the mobile backhaul’s flexibility based on SDN technology, compliant with the current architecture. In particular, we propose a user-space ex-tension to OpenFlow switches inside the mobile backhaul and show the benefits of network devices’ programmability by designing and prototyping a transparent cache service.
1.2
Objectives
Considering these research challenges, the main objective of this Research Project is to propose and evaluate replica server management strategies to improve multimedia content delivery efficiency. The specific goals of this Doctoral Thesis are:
To propose a new RPA considering new technologies available that reduces OPEX costs with little to no impact in Quality of Experience (QoE).
To construct and evaluate models that represent the collaboration between CDNs, considering among other variables, client dispersion through the network and different bandwidth capacities.
To propose a solution to enhance the mobile backhaul’s flexibility compliant with the current architecture.
1.3
Thesis Outline
This Doctoral Thesis identifies the challenges involved in replica server management for multimedia content delivery and presents solutions (algorithm, tool, model and prototype) to improve replica server management considering different application scenarios. The remainder of this document is organized as follows:
Chapter 2 presents a new RPA called FlowCount. It is a greedy algorithm based on the number of flows passing through the nodes that compose the network. We present also a comparison between the strategy proposed and other well-known RPAs.
Chapter 3 shows an analytical model that represents the collaboration between CDNs. Chapter 4 presents a user-space extension to OpenFlow switches inside the mobile back-haul and show the benefits of network devices’ programmability by designing and prototyping a transparent cache service.
2
FlowCount: Dynamic Replica Placement
Al-gorithm
I never am really satisfied that I understand anything; because, understand it well as I may, my comprehension can only be an infinitesimal fraction of all I want to understand about the many connections and relations .
—ADA LOVELACE
Taking into consideration the tool and concepts presented in Appendix A this chapter presents a new dynamic RPA strategy called FlowCount. The strategy thrives from Distributed Cloud computing services to propose a dynamic strategy that is able to efficiently manage CDN resources maintaining QoE. The primary goal of this Chapter is to describe the proposed strategy and present the evaluation made using the simulator P2PCDNSim.
This Chapter is organized as follows: Section 2.2 presents the state of the art regard-ing RPAs. Section 2.3 presents a description of the algorithm along with its analysis. Section 2.4 presents an evaluation of the algorithm, comparing its performance with other RPAs found in the literature, finally Section 2.5 presents concluding remarks.
The results obtained from this Chapter were published in Rodrigues et al. (2013a) and Rodrigues et al. (2013b).
2.1
Introduction
The Internet plays a crucial role in our modern society, and its usage is increasing every day promoting new challenges. CDNs helped improving accessibility through content replication in replica servers near clients (BUYYA; PATHAN; VAKALI, 2008). The success of CDNs is illustrated by Akamai’s, one of the key players in the market, impressive numbers. Akamai claims to handle between 15 and 20% of the world’s Web traffic, corresponding to over a trillion requests per day (QUARTER, 2012). The idea behind CDNs is to exceed the classical client-server architecture and spread content towards the network edges. Therefore, one of the primary
concerns is to decide where to place content. Techniques designed to solve this problem fall into two basic categories: caching algorithms and Replica Placement Algorithm (RPA). The former category consists of distributed algorithms that perform content management within replica servers’ storage areas. They are also known as caching replacement techniques or, simply, caching techniques (JAMIN et al., 2001). The latter category relates to choosing the best location for replica servers, thus reducing their perceived latency and bandwidth consumption. Furthermore, RPAs can be divided into two categories, namely static and dynamic. Static strategies consider only one supposedly perfect placement, in other words, replicas are fixed. Dynamic strategies, on the other hand, adapt the placement of replicas according to scenario changes.
CDN providers understood the importance of Cloud Computing in the very early stages and started leveraging this new paradigm. Cloud Computing emerged proposing a conceptual layer on the Internet, which turns all available software and hardware resources transparent, making them accessible through a well-defined interface. Notions such as on-demand self-service, broad network access, resource pooling (DILLON; WU; CHANG, 2010) and other trademarks of Cloud Computing services were an essential point to its current popularity. Most Cloud Computing providers, as major CDN players, rely on large and consolidated data centers. Such data centers are infrastructures expensive and hard to manage. Thus, small and geographically distributed data centers could also be an alternative to Cloud providers since they can offer cheaper and low-power consumption alternatives that reduce the significant costs of centralized data centers. These small and distributed data centers can be built and connected to different geographical regions to form a Distributed Cloud (DCloud) (CHANDRA; WEISSMAN, 2009; GONÇALVES et al., 2012). Furthermore, many Cloud Storage services emerged in the last years providing data storage in several continents backed by rigorous Service Level Agreements (SLAs) (CATHERINE; EDWIN, 2013). Cloud-oriented CDNs use DCloud infrastructure to map CDN infrastructure components into virtual components in the Cloud. For instance, a surrogate server can be mapped into an Infrastructure as a Service (IaaS) service or even a Cloud storage service. These capabilities can bring new opportunities such as enabling a small business to become a CDN provider, offering content delivery service for third parties without the cost of owning or operating geographically distributed data centers. An example of this approach is MetaCDN (BROBERG; BUYYA; TARI, 2009).
We believe that those technologies devise new possibilities for CDNs, resulting in more flexible possibilities regarding resource management. For instance, through Cloud services, one could dynamically manage CDN resources to accompany specific demands, such as flash crowd events. In the next Section, we present the state of the art regarding RPAs.
2.2
RPA State of the Art
Some theoretical approaches model the RPA problem as the “center placement problem”: for the placement of a given number of centers, they minimize the maximum distance between a node and the nearest center. Some variants of this problem are the facility location problem, k-hierarchically well-separated trees and the minimum K-center problem (BARTAL, 1996). The minimum k-center problem is too complex and computationally intensive to be used in practice.
Due to the computational cost of these algorithms, some heuristics have been proposed. They take into account existing information from a CDN, such as workload patterns and the network topology, and devise reasonable solutions with a lower computation cost. The work in Chen, Katz, and Kubiatowicz (2002) evaluated a set of heuristic based strategies characterized along three axes: metric scope (the technique used, centralized or decentralized computation), approximation method (e.g. ranking, relaxation, fixed threshold, and dynamic programming) and cost function simplification.
During the last decade, there has been a considerable number of research papers on replica placement. First algorithms fit into the static placement group (KRISHNAN; RAZ; SHAVITT, 2000; QIU; PADMANABHAN; VOELKER, 2001; JAMIN et al., 2001; KANGASHARJU; ROBERTS; ROSS, 2002). Overall, the best representatives of static placement strategies are Greedy and HotSpot, previously discussed in Section A.1.1.
The next RPA group comprises dynamic strategies, dynamic in the sense that RSs are added or moved according to the dynamically changing user request traffic (PRESTI; BAR-TOLINI; PETRIOLI, 2005). They represent an improvement on static approaches that poorly adapt to changes in user requests. The work in (KHAN et al., 2009) discussed a robust replica placement for improved performance under the uncertainty of random server failures while (XU; BHUYAN, 2005) introduced also QoE awareness. In (SUN et al., 2011), a model to reduce the computational cost of the heuristics is presented to address problems of limited storage capacity. It also performed a comparison through simulation of the main heuristics found in the literature. Authors in Khalaji and Analoui (2013) present an evaluation of several RPAs considering the hybrid CDN-Peer to Peer (P2P) scenario. This scenario proposes the active participation of CDN clients in a P2P distribution network to assist CDN’s content delivery system. The study in question presents a comparison between well-known strategies, namely Greedy, Hot Spot and Constraint P-Median (CPM). CPM is an approach based on minimizing the sum of weighted distances from all vertices to the selected points. Their simulation results show that considering the Hybrid CDN scenario, the strategy that results in lower cost of content replicating is Greedy.
In the next Section, we present the FlowCount, our proposal for a dynamic RPA based on a Greedy heuristic that uses the number of flows and distance between servers and clients to manage replica placement.
2.3
FlowCount Algorithm Description
This section presents a detailed description of the FlowCount strategy, divided into two subsections. The first subsection presents a description of our approach, describing the functioning of the reallocation strategy and showing an example of how we count flows. The second subsection shows a complexity analysis of the proposed algorithm.
2.3.1
FlowCount Strategy
The CDN can be modeled as a number of nodes, replica servers, clients and a routing matrix. Expected output of a RPA is a placement matrix linking the replica servers to the nodes, representing the optimal location(s) (KARLSSON et al., 2002). The replica placement problem consists basically of a cost function that has to be optimized under certain constraints (number of replicas, server capacity or client quality of experience). The replica placement problem belongs to the NP-complete complexity class (SUN et al., 2011).
Our FlowCount placement strategy is based on a greedy algorithm with a particular selection function. This selection function uses the number of flows as the main metric to decide upon placement. We consider that a node with high flow count could represent central nodes regarding content distribution. The Flow Count placement strategy follows this idea and counts all flows passing through all routing nodes in the topology, then later uses this information to place replica servers.
We divide our strategy into two basic parts. The first one is counting flows and the second one is analyzing the flow counts to decide where to place replicas. The first part is made by an analyzer running on every routing node that counts and updates tables with information about all flows passing through that node. We identify SDN technologies as a possible tool to enable flow monitoring. In our experiments, a flow is represented by the object identifier and the destination address. This information is stored to prevent counting a recognized flow again. Recent works involving both Content-Aware Networks (CAN) (NICULESCU et al., 2011) and new network management tools (MCKEOWN et al., 2008) present new horizons that make collecting this information possible.
The second part of the strategy relates to the decision regarding where to place servers based on two different pieces of information: topology and flow counts. The topology represents the network and candidate locations to place servers. Nodes with higher flow counts are likely central nodes regarding traffic, in other words, they should be critical nodes for the overall content distribution. Since the number of flows passing by each node is directly related to the current replica placement, it is important to notice that we set to zero flow counts for all nodes after every replacement made. The pseudo-code Algorithm 1 illustrates the placement selection process for FlowCount placement strategy. The first for loop selects a candidate node, representing a possible node to place a replica. The line four updates the flow count for the selected candidate node.
The second for loop calculates the cost of serving content according to the current configuration, considering the current candidate node and previous replicas placed. After examining all nodes, the result of selecting the current candidate node is compared with the best cost so far, and the best choice is updated accordingly. To dynamically adapt to workload changes, we repeat this process every T seconds. We consider the cost as being f lowCount · distanceTo(node), as illustrated in lines 6 and 9 of Algorithm 1. By selecting the minimal between the candidate and other already placed replicas cost, the algorithm tries to minimize total flow count. Overall configurations costs are ordered and replicas are placed according to the the placement that resulted in lower costs.
Algorithm 1 Algorithm that selects the best candidate node to place a replica server.
1: procedureBESTNODE
2: for each candidateNode nodeList do
3: for each node nodeList do
4: f lowCount ← f lowCountList(node).
5: for each replica placedReplicaList do
6: replicaCost← f lowCount × distanceTo(replica).
7: if replicaCost < costToClosestReplica then
8: costToClosestReplica← replicaCost.
9: costToCandidate← f lowCount × distanceTo(candidateNode).
10: cost← min(costToClosestReplica, costToCandidate).
11: if cost < lowerCost then
12: bestNode← node.
13: lowerCost← cost.
2.3.2
Complexity Analysis
This Section presents a complexity analyses of the FlowCount RPA. Within this Section we will consider K to be the total number of replica servers and N to be the number of possible nodes where one can allocate replica servers.
Our algorithm is based on the classic Greedy replica placement (QIU; PADMANABHAN; VOELKER, 2001) a known solution to the replica placement problem. Considering N and K as described earlier, the greedy algorithm’s complexity is O(K · N2). The FlowCount strategy uses the same basis as the Greedy algorithm, however using a different metric as the cost and thus has the same complexity as seen in Algorithm 1. There are three main loops, two of them for each one of the possible nodes to place, resulting in N2interactions, and a third one considering replicas already placed; in other words, at most K iterations. Thus, Flow Count replica placement strategy is also O(K · N2).
2.4
Experiments and Results
To evaluate the proposed strategy, we used two scenarios. The first and smaller one, called the YouTube scenario, it uses a YouTube trace1made by the Laboratory for Advanced Software Systems from the University of Massachusetts Amherst . The second, called the ProwGen scenario, we generated using ProwGen (BUSARI; WILLIAMSON, 2002) a popular and cited tool described as "a synthetic workload generation tool for simulation evaluation of web proxy caches". Both scenarios used the 500, divided across 10 Autonomous System (AS)es, nodes topology generated using the topology generator BRITE (MEDINA et al., 2001), with 10 replica servers and illustrated in Figure 2.1, initially placed one per AS.
In the YouTube scenario, links between nodes and CDN entities had 2GB/s capacity, except those between the ASes which were 10GB/s instead, and the clients had 6MB/s access links. In the ProwGen scenario clients had 1MB/s links while routers and other CDN entities had 1GB/s links. Client links reflected average connection speed reported by Akamai for consumers in North and South Americas (QUARTER, 2012).
The YouTube scenario has 5771 requests placed in the topology in round robin fashion. Requested object size followed an exponential distribution (BUSARI; WILLIAMSON, 2002) with a 2MB mean. The other scenario was generated using ProwGen and is a much larger one. It had almost 1 million requests and two flash crowds during the timeline. Table A.1 illustrates this scenario. This scenario has a basic workload lasting 24 hours and two flash crowds, namely Flash Crowd 1 and Flash Crowd 2 lasting 2 and 1,5 hours each. During both flash crowds one can notice modifications regarding peer arrival and object popularity, reflecting expected characteristics of such events as discussed in Section A.1.2. Moreover, different from the basic workload that has traffic scattered through the topology, Flash Crowds 1 and 2 are located in AS0 and AS5 respectively.
For both scenarios we used P2PCDNSim’s CDN Overlay relocating RSs every T seconds. Each relocation process used a different RPA considering only information received during the last T seconds to perform replica placement. We used two different values for T, according to the total duration of the each one of them, therefore the first and smaller Youtube scenario considered T = 1000s whilst ProwGen scenario used T = 5000s.
2.4.1
Metrics
During our simulations we collected a series of metrics to evaluate and compare the strategies used. This section describes all the collected metrics and what they represent.
Startup Delay: represents the time between requesting the content and receiving first useful information.
Figure 2.1: 500 nodes topology used in the evaluation of the FlowCount strategy.
Source: Made by author.
Table 2.1: Description of the 24 hours scenario generated used ProwGen (BUSARI; WILLIAMSON, 2002)
Basic
Workload Flash Crowd 1 Flash Crowd 2
Duration 24 hours 2 hours 1,5 hours
# of objects 432 1440 810
Peer arrival 4 peers/second 40 peers/second 60 peers/second
# of requests 345512 287999 323937
Objects
Popularity Zipf (0.6) Zipf (1) Zipf (1)
Cacheable 40% 10% 15%
ASId All of them AS0 AS5
Total Network Traffic: as presented in Section A.3 represents total traffic passing through all nodes that compose the network.
Routers Inner and Cross Traffic: as presented in Section A.3 those metrics follow the same basic idea as Total Network Traffic in the sense that they represent the total traffic passing through nodes. However, they limit the set of nodes considered, in the case of the Routers Inner Traffic we only account traffic within an AS on the other hand, considering Routers Cross Traffic we only account traffic between ASes.
Bandwidth 95th Percentile: this is a typical metric used by ISPs to charge cus-tomers (GOLDENBERG et al., 2004; VANDERHOOF, 2011; SLATTERY, 2011). Using this metric, ISPs record traffic volume every user generates during a 5 minute interval. At the end of the agreed period, for instance, one month, the 95th of all records is used as the charging volume for each client. This metric is a way to measure bandwidth usage allowing customers to burst beyond their committed base rate while providing the carrier with the ability to scale billing accordingly.
2.4.2
Simulation Results
First, we ran simulations using the YouTube scenario. This was our first test using the new strategy and our intention was to compare the new strategy performance to other approaches found in the literature, namely Greedy and Hotspot. Greedy is known to outperform Hotspot in terms of network usage.
Figure 2.2 shows Total Network Traffic Bandwidth Timeline, we can see that Hotspot has a clear higher total traffic during peak traffic situations, between 40000s and 60000s. We can also notice a smaller difference between FlowCount and Greedy metrics, which seems to indicate that FlowCount needs more time to adjust to sudden traffic changes. Analyzing the Cross AS Traffic bandwidth timeline in Figure 2.3, we can see that, throughout the simulation, FlowCount had equal or lower cross traffic than all the other strategies. Furthermore, once again Hot Spot is outperformed by Greedy and FlowCount strategies. Considering that several studies argue that cross traffic is more expensive than traffic within a domain (AGGARWAL; AKONJANG; FELDMANN, 2008; CHOFFNES; BUSTAMANTE, 2008; LE BLOND; LEGOUT; DABBOUS, 2011; RUAN et al., 2009; SEEDORF; KIESEL; STIEMERLING, 2009), this result demonstrates that Flow Count could be a valuable player in reducing network costs.
Table 2.2 presents 95th percentile values for Total Network Traffic and Cross AS Traffic metrics. Although the difference in total network traffic bandwidth is almost insignificant, cross-network traffic is more than 20% lower demonstrating that the proposed strategy renders potential cost savings.
With regard to QoE, Figures 2.4 and 2.5 shows the startup delay mean and standard deviation for all strategies. Figure 2.4 show the box plot for startup delays collected from all 5771 clients for all strategies. One might notice that because of some outliers it is not possible to devise a clear conclusion QoE performance comparison between strategies. For a clear comparison we decided to exclude all outliers, Figure 2.5 show the box plot for all startup delays collected during the experiment excluding outliers. It is clear that considering this scenario, all three strategies have nearly the same startup delay, around 9ms, with a slightly better-perceived experience when using Flow Count strategy. It is important to notice that for all scenario presented here the number of outliers represented less than 1% of the total number of clients (5771).
Figure 2.2: Bandwidth timeline during the Youtube scenario simulation.
Source: Made by author.
Figure 2.3: Cross traffic timeline during Youtube scenario simulation.
Source: Made by author.
experiments using a bigger and more representative scenario, namely the ProwGen scenario. This larger scenario simulates two different flash crowds with various content objects sets in a 24 hours time window. The first flash crowd event goes from approximately 18000s to 25000s and the second flash crowd event goes from approximately 43000s to 48000s. Considering Hot Spot’s poor performance during Youtube scenario experiments, which agrees with other
Table 2.2: 95thpercentile in MB/s for Youtube scenario considering Total Network Traffic and Cross AS Traffic metrics.
Total Traffic Bandwidth Cross Traffic Bandwidth FlowCount 4791,75 191,8 Greedy 4793,05 243,8 Hot Spot 6153,2 938,25
Figure 2.4: Startup delay box plot for YouTube scenario considering all clients.
Source: Made by author.
results found in the literature (QIU; PADMANABHAN; VOELKER, 2001), we now focus on Greedy and FlowCount strategies. Figure 2.6 illustrates Total Network Traffic bandwidth time line during ProwGen scenario simulation. We can see that FlowCount has a slightly higher Total Network Bandwidth traffic only during both flash crowd events. Before carefully looking at the results, this could signal that Greedy outperforms FlowCount. However, if we look at Figure 2.7, illustrating Cross AS traffic, we notice that during both flash crowd events using FlowCount RPA results in a significant cross AS traffic reduction. As commented earlier, several studies signal cross traffic as being more expensive than inner AS traffic. In other words, demonstrating once again that Flow Count could be a valuable player in reducing network costs.
Table 2.3 shows 95th percentile values for Total Network Traffic and Cross AS Traffic metrics considering the ProwGen scenario. Once again, we are not able to see significant difference in terms of total network bandwidth timeline. Nonetheless, when considering cross traffic the difference is even bigger than in the previous scenario, as we can see in Table 2.3.
Figure 2.5: Startup delay box plot for YouTube scenario excluding outliers.
Source: Made by author.
Therefore, we expect great operational cost reductions when using the FlowCount strategy instead of the Greedy placement strategy.
Figure 2.6: Bandwidth timeline during the Youtube scenario simulation.
Figure 2.7: Cross traffic timeline during Youtube scenario simulation.
Source: Made by author.
Table 2.3: 95thpercentile for ProwGen scenario considering Total Network Traffic and Cross AS Traffic metrics.
Total Traffic Bandwidth Time line (GB/s) Cross Traffic Bandwidth Time line (MB/s) FlowCount 224,8 3516,6 Greedy 217,8 5943,7
2.5
Conclusion
This Chapter presented a novel Replica Placement Algorithm (RPA), namely FlowCount, that thrives from Distributed Cloud computing services to enable dynamic replica relocation efficiently manage CDN resources maintaining QoE. Simulation results shows that the novel strategy proposed, Flow Count placement, provides similar QoE and slightly higher Total Network Traffic. However, considering Cross AS Traffic, using the proposed strategy resulted in considerable cross traffic reductions, specially during flash crowd events, reaching 40% less inter AS traffic. These findings provide a good incentive for the actual deployment of the Flow Count placement algorithm in present CDNs.
One of the main lessons learned regards the need for a dynamic adaptation of work-ing conditions. Findwork-ing the best location for each replica server is important for the optimal functioning of a CDN, while also assuring the best QoE and network utilization metrics. This ideal location, however, varies with time, and while an adaptation is necessary, an instantaneous re-adaptation might lead to instability in the location of servers (for instance, the same server
being relocated constantly). Therefore, a certain relocation stability is desired, providing a theoretically sub-optimal solution while still providing lower cross-traffic.
Next chapter presents an analytical model to evaluate possible CDN collaboration benefits. The proposed model is an extension of a previously published analytical model in order to consider the collaboration between CDNs. Our results show that offloading requests to neighbor content networks could help increase Quality of Experience (QoE).
3
Multiple CDNs Collaboration
I have yet to see any problem, however complicated, which, when looked at in the right way, did not become still more complicated.
—POUL ANDERSON
Considering CDN’s popularity and the ever growing number of small localized CDNs deployed, this chapter presents an analytical model for collaboration among CDNs, considering, client dispersion through the network, different overall capacities, and cache miss. Results demonstrate that CDNs collaboration could lead to better QoE. The primary goal of this Chapter is to present the proposed model and an evaluation of collaboration benefits in terms of user QoE. This Chapter is organized as follows: We start by presenting an introduction of the problem in Section 3.1. Section 3.2 presents the state of the art regarding CDN and multiple CDN models. Section 3.3 presents the proposed model followed by a discussion regarding the model in Section 3.3.1. We then show our first results obtained with the model in Section 3.4. Afterwords, we discuss redirection strategies and their overhead in Secion 3.5 and present results considering the redirection overhead in Section 3.5.1. Section 3.6 discuss lessons learned and Section 3.7 presents concluding remarks.
The results obtained from this Chapter were published in Rodrigues et al. (2013) and Rodrigues et al. (2014).
3.1
Introduction
CDNs are de facto accepted as the primary content delivery strategy, and such popularity resulted in an increasing number of CDNs deployed with several purposes (NIVEN-JENKINS; LE FAUCHEUR; BITAR, 2012), varying regarding content, coverage, and capacity. Among them, a very limited number aims to distribute content on a worldwide scale, such as Akamai1 and Limelight2. Most of them have a restricted coverage, being extremely localized, meaning
1https://www.akamai.com/
that their coverage is restricted to a single Internet Service Provider (ISP) (BERTRAND et al., 2012; FRANK et al., 2013; SHARMA; VENKATARAMANI; SITARAMAN, 2013). Increasing each CDN’s capacity, in terms of coverage and capability, would be very expensive. Through collaboration between restricted CDNs, coverage could expand temporally to handle a set of requests greater than the local CDN capacity (NIVEN-JENKINS; LE FAUCHEUR; BITAR, 2012; JESUS; AGUIAR, 2012). CDNs could also negotiate collaborations to fulfill service level agreements established between the CDN and content providers. However, effective collaboration raises several challenges (PATHAN et al., 2007a; NIVEN-JENKINS; LE FAUCHEUR; BITAR, 2012). Currently, proposed techniques handle client redirection to servers controlled by the same CDN, a much simpler scenario than redirecting to servers belonging to the third party CDNs. Furthermore, considering the diversity of existing CDNs, determining their essential features to decide which third party CDN is most suitable to tackle a problem offers a significant challenge. There is, therefore, the demand for a way to evaluate collaboration scenarios among CDNs to quantify the possible gain obtained through collaboration, as well as to gather more information about the several variables involved in the process.
This Chapter presents an analytical model for collaboration among CDNs. The model exposes dependencies of response time to replica server capacities and proportion of requests redirected to the foreign CDN. It considers among other variables, client dispersion through the network, different bandwidth capacities, and cache misses. Validating our hypotheses and Pathan’s (PATHAN; BUYYA, 2008), our analyses show that CDN collaboration can effectively decrease response time. The main contributions presented in this Chapter are three-fold:
An analytical model for collaboration among CDNs, where clients and servers are geographically distributed with different latencies and bandwidth capacities;
Feasibility analysis of collaboration among CDNs;
A comparison between two Collaboration Request Routing Strategies, Recursive and Interactive.
3.2
State of the Art
3.2.1
CDN Collaboration
In Vakalli and Pallis (2003) authors describe basic CDN entities, their relations and be-havior. Also, authors comment on several aspects of CDNs including peering following the early steps of Internet Engineering Task Force (IETF)’s Content Distribution Internetworking (CDI) workgroup (DAY et al., 2003). CDN brokering (BILIRIS et al., 2002) proposes client redirec-tion through Domain Name Service (DNS) redirecredirec-tion techniques. It uses a brokering CDN server to handle DNS redirections using an Intelligent Domain Name Server (IDNS) which
considers metrics like load status rather than a static response. The main issue is that IDNS is proprietary and might not be suitable for CDN interconnection/collaboration. In a work related to the CDI (BERTRAND et al., 2012), CDN collaboration is proposed through three systems; Request-routing Internetworking System (RRIS), Accounting Internetworking Sys-tem (AIS), and Distribution Internetworking SysSys-tem (DIS). RRIS redirects client’s request to the CDN that better satisfies it according to performance data exchanged between CDNs through the Request-Routing Information Exchange Protocol (RIEP). Main AIS’s responsibility is to exchange accounting data, in other words data related to resource consumption. Finally, the DIS moves content between CDNs.
IETF’s Content Delivery Networks Interconnection (CDNI) workgroup restarted the discussion about patterns and protocols to enable CDN collaboration. Their main goal is to allow interconnection of multiple CDNs under different administration. This interconnection should consider all actors involved, from CDNs to content providers and end users. Discussions also promote concern about some distinct subjects such as the complex accounting mechanism and the management of agreements needed for a good collaboration. They propose a taxonomy for the actors in a collaboration scenario, upstream and downstream CDNs. Respectively, the primary CDN and the CDN hired to team up with the primary CDN. In Buyya et al. (2006) authors present concepts for CDN collaboration where the peering model is based on a different type of CDN, the Content Service Networks (CSN), which act as another infrastructure layer on top of the CDN forming the Content and Service Delivery Networks (CSDN). Using their approach a CDN can share and request resources according to specific needs. Resources are published and found through a Service Registry. In Pathan et al. (2007b) extended discussion about a peering model is presented. The idea is to serve clients using local resources as long as they are enough. Formation of a Virtual Organizations (VO) is initiated by a CDN called the primary CDN; all other CDNs in the same VO are called peering CDNs. User actions could result in collaboration which are transparent from the user’s point of view. In Chang et al. (2012) authors propose a strategy to deploy CDNI using OpenFlow3. The idea is that the controller would receive information from the CDN and, if needed, manage the interconnection between CDNs. SDN is also used to enable CDN collaboration in Wichtlhuber, Reinecke, and Hausheer (2015) where authors propose a system for ISP/CDN interaction based on a minimal deployment of SDN-capable switches inside the CDN provider’s network. A proof-of-concept deployment is presented and used to evaluate and discuss performance.
3.2.2
CDN Models
In Molina, Palau, and Esteve (2004) authors present a model based on queues M/M/1 to model response time, and CDN components. Through the proposed model authors show basic concepts such as the advantage of using replica servers due to their proximity to clients. They did
not consider multiple CDN collaboration and used static entity placement, which are currently important aspects covered by the model presented in this Chapter.
The only model found that proposed CDN collaboration is found in Pathan and Buyya (2008). In their model, collaboration is made up of a set of queues M/G/1 showing how it can help a CDN meet service level agreements, even at a high rate of incoming requests. Although useful as a first step on modeling multiple CDNs, the paper has some limitations, such as not considering cache misses nor collaboration overhead. The work (JESUS; AGUIAR, 2012) proposes a model to estimate the cost to deliver content to users and evaluate the efficiency of using CDN collaboration. Their results show that inter connecting CDNs might not be advantageous for both CDNs providers. Furthermore, in Jeong et al. (2013) authors model CDN traffic to evaluate traffic reduction considering CDN collaboration. They present three different optimization models, no CDN, CDN only, and CDN-Interconnection. Their study shows that CDN caching can reach up to 24% traffic reduction. Likewise, Telco- CDN and Telco- CDNI can reduce traffic up to 6% and 27% respectively.
None of those studies focus on QoE, as our proposed model. Also, to the best of our knowledge, our model is the first one to deal with scenarios of multiple CDNs collaboration considering collaboration overhead and cache misses.
3.3
CDN Collaboration Model
Our model represents CDN’s basic components (as detailed in Section A.1) along with other entities related to the CDN collaboration scenario, illustrated in Figure 3.1. Our model extends previous work introducing CDN collaboration to the model presented in Molina, Palau, and Esteve (2004). We describe each entity as follows:
Origin Server (OS): represented by the circle in the middle of Figure 3.1. The entity responsible for storing the original content. In case of a cache miss, RS will request content copies from the OS.
Clients: represented by squares distributed on the outer circle, they represent client clusters. We consider a group of clients located within the same domain, for instance, the same ISP as a client cluster.
Replica Servers (RS): There are two types of replica servers:
Local RS: represented by triangles, local replica servers are RSs that belong to the upstream CDN. They are placed between the OS and clients. Pl represents the total number of local RS.
Foreign RS: represented by trapeziums, foreign RSs are replica servers from downstream CDNs. They are the result of eventual collaborations
made by the primary CDN, and are also placed between the OS. Pf
represents the total number of foreign RS.
Request Redirector (RR): they are the center of the inner circles in Figure 3.1. The RR is the entity responsible for receiving client requests and redirecting it to the most suitable RS. There are two types of RR, the Foreign RR and the Local RR. The later is the RR that belongs upstream CDN placed in the center of the Local RSs circle, whereas Foreign RR is located in the center of the Foreign RSs circle. Consider M the number of client clusters positioned in a circle around the Origin Server (OS). Inside the client cluster circle, we have two other circles of Replica Servers (RS), with the Request Redirector (RR) in the center.
τois the distance, or Round Trip Time (RTT), between client clusters and the OS. RSs
have a different RTT; they are τland τf away from the origin server, and local replica servers
are τpl away from clients whilst foreign replica servers are τp f away from clients. Figure 3.1: Basic model components.
Source: Made by author.
When the CDN receives a new request, it can be handled either by a local RS, a foreign RS from an active collaboration, or by the origin server. Therefore, we consider plthe probability