• Nenhum resultado encontrado

Novos serviços cloud para publicidade dinâmica

N/A
N/A
Protected

Academic year: 2021

Share "Novos serviços cloud para publicidade dinâmica"

Copied!
64
0
0

Texto

(1)

F

ACULDADE DE

E

NGENHARIA DA

U

NIVERSIDADE DO

P

ORTO

New Cloud Services for Dynamic

Advertising

Vasco Manuel Pérola Filipe

Mestrado Integrado em Engenharia Informática e Computação Supervisor: Maria Teresa Magalhães da Silva Pinto de Andrade

Co-Supervisor: Alexandre Ulisses F. Almeida e Silva

(2)
(3)

New Cloud Services for Dynamic Advertising

Vasco Manuel Pérola Filipe

Mestrado Integrado em Engenharia Informática e Computação

Approved in oral examination by the committee:

Chair: Luis F. Teixeira

External Examiner: José Manuel Torres Supervisor: Maria Teresa Andrade

(4)
(5)

Abstract

With the constant technological evolution that has occurred in recent years, and with information becoming increasingly globalized and with a higher level of consumption, advertising has played an increasing role in the revenue of television stations and other suppliers of audiovisual material . However, this globalization and ease of access to information makes targeted advertising difficult, which has diminished its effectiveness. In addition, there are also television contracts that oblige stations to limit the broadcast of content to a particular region. This need to present a personalized product almost instantaneously results in a greater emphasis on automation, namely in the detec-tion and replacement of advertisements in real time. In this dissertadetec-tion, a distributed applicadetec-tion capable of identifying and replacing advertising segments in an audiovisual stream is then pro-posed. The prototype will be able to receive audio and video in real time and detect advertising segments, replacing them with other content, which may come from a stream or a file. The end result is a stream, with the least possible delay, where the initial advertising segments are replaced by more relevant ones. Taking into account the modularity of this project, it will be possible to add or remove components depending on the use case, enabling solutions not only in advertising but also in the transmission and editing of video and audio in the cloud.

(6)
(7)

Resumo

Com a constante evolução tecnológica que se tem verificado nos últimos anos, e com a informação a ser cada vez mais globalizada e com um nível de consumo mais elevado, a publicidade tem tido um papel crescente na receita das estações televisivas e outros fornecedores de material audiovi-sual. No entanto, esta globalização e facilidade de acesso a informação torna difícil a publicidade direcionada, o que diminuiu a eficácia da mesma. Para além disso, existem ainda contratos tele-visivos que obrigam as estações a limitar a emissão de conteúdo a uma determinada região. Esta necessidade de apresentar um produto personalizado e de forma quase instantânea resulta numa maior aposta na automatização, nomeadamente na deteção e substituição de anúncios publicitários em tempo real. Nesta dissertação é então proposta uma aplicação distribuída capaz de identificar e substituir segmentos de publicidade numa stream audiovisual. O protótipo será capaz de receber áudio e vídeo em tempo real e detetar segmentos publicitários, substituindo-os por outro conteúdo, podendo este ser proveniente de uma stream ou de um ficheiro. O resultado final é uma stream, com o menor atraso possível, onde os segmentos publicitários iniciais são substituídos por outros mais relevantes. Tendo em conta a modularidade deste projeto, será possível adicionar ou remover componentes consoante o caso de uso, possibilitando soluções não só na área da publicidade, mas também na transmissão e edição de vídeo e áudio na cloud.

(8)
(9)

Acknowledgements

First, I would like to thank my family for providing me with all the conditions to evolve as a professional and as a person. To Maria Teresa Andrade for accepting this challenge and guiding me through it. To MOG Technologies, for providing me with the conditions necessary to develop this work, especially to Alexandre Ulisses, for the challenge proposed, and to Pedro Santos, Miguel Poeira and Vasco Gonçalves for all the help provided during the semester.

(10)
(11)

“Whether you think that you can, or that you can’t, you are usually right.”

(12)
(13)

Contents

1 Introduction 1 1.1 Context . . . 1 1.2 Motivation . . . 2 1.3 Goals . . . 2 1.4 Document structure . . . 2 2 Cloud Computing 5 2.1 Cloud Models . . . 6

2.2 Private Vs Public Vs Hybrid Clouds . . . 7

2.3 Virtualization . . . 8

2.3.1 Hypervisors . . . 8

2.3.2 Containers . . . 9

2.3.3 Docker . . . 9

2.4 Advantages of Cloud Computing . . . 10

3 Digital TV Production over IP 11 3.1 SDI . . . 11

3.2 Internet Protocol . . . 11

3.3 Transport Layer . . . 12

3.4 MPEG-TS . . . 12

3.5 Real-Time Transport Protocol . . . 13

3.5.1 RTP Payload Format for Uncompressed Video . . . 14

3.6 SMPTE 2022-6 . . . 14

3.7 JT-NM Architecture . . . 14

3.8 NMOS . . . 16

4 Automation of Advertising Replacement 19 4.1 Context . . . 19 4.2 Market Solutions . . . 19 4.2.1 Anvato . . . 19 4.2.2 Audible Magic . . . 21 4.2.3 ACRCLOUD . . . 21 4.2.4 Ivitec . . . 22 4.2.5 Adobe Primetime . . . 22 4.2.6 Comparative Analysis . . . 23 4.3 Requirements . . . 24 4.4 Proposal . . . 24 4.4.1 Architecture . . . 24

(14)

CONTENTS

4.4.2 Modularity and Scalability . . . 28

4.5 Prototype . . . 28

4.5.1 Prototype limitations . . . 30

4.5.2 MOG MPL . . . 30

4.5.3 Implementation . . . 30

5 Results and analysis 35 5.1 Test Methodologies . . . 35 5.2 Results . . . 36 5.2.1 Frame Rate . . . 36 5.2.2 Bandwidth . . . 37 5.2.3 RAM Usage . . . 37 5.2.4 CPU Usage . . . 38 5.2.5 Cascade Scenario . . . 38 5.2.6 Results . . . 38

6 Conclusions and Future Work 41 6.1 Fulfillment of Goals . . . 41

6.2 Future Work . . . 41

References 43

(15)

List of Figures

2.1 Cloud Model Layer System . . . 6

2.2 Comparison between traditional and cloud models . . . 7

2.3 Comparison between hypervisors and container engines . . . 9

3.1 PAT and PMT [Cia09] . . . 13

3.2 JT-NM simplified Architecture . . . 15

3.3 Node proposed by NMOS [Ass16a] . . . 17

4.1 Live dynamic server side ad insertion[Anv] . . . 20

4.2 VOD dynamic server side ad insertion[Anv] . . . 20

4.3 Technology of Audible Magic[Mag] . . . 21

4.4 - Fingerprint density versus type of search [Tec] . . . 23

4.5 Cloud Application Architecture . . . 25

4.6 Input Distributor media flows . . . 25

4.7 VOD Replacement media flows . . . 26

4.8 Video Switcher media flows . . . 26

4.9 Business Logic media flows . . . 27

4.10 Advertising Detector media flows . . . 27

4.11 Output media flows . . . 28

4.12 VoD Storage use case . . . 29

4.13 Additional Input Feeds Architecture . . . 29

4.14 Prototype Developed . . . 31

5.1 Application deploy scenario and data flow . . . 37

(16)

LIST OF FIGURES

(17)

List of Tables

4.1 Comparative analysis by features . . . 24

5.1 Technical Specification of the test machines . . . 35

5.2 Video Specifications . . . 36

5.3 Frame Rate in each module . . . 36

5.4 Average Inbound and Outbound traffic in each module . . . 37

5.5 Average Module RAM Usage . . . 38

(18)

LIST OF TABLES

(19)

Abbreviations

API Application Programming Interface CPU Central Processing Unit

EDL Edit Decision List FPS Frames Per Second

HTTP HyperText Transfer Protocol HTTPS HyperText Transfer Protocol Secure IGMP Internet Group Management Protocol IAB Interactive Advertising Bureau IP Internet Protocol

JT-NM Joint Task Force on Networked Media MPEG-TS MPEG Transport Stream

NMOS Networked Media Open Specifications OS Operating System

RFC Request for Comments RTP Real-time Transport Protocol SLA Service-Level Agreement

SMPTE Society of Motion Picture and Television Engineers TCP Transmission Control Protocol

UDP User Datagram Protocol VAST Video Ad Serving Template VM Virtual Machine

VOD Video on Demand

(20)
(21)

Chapter 1

Introduction

In 2016, more than 500 thousand million dollars were expended in advertising, a 7.1% increase from the year before. From this total, 36% was spent on the television market and 29.9% on online advertising. [Mar]

Although being a market in expansion, advertising has been experiencing some problems, namely in targeted publicity. This problem is especially relevant in television, where a TV segment can be broadcasted to multiple countries. This reduces the efficiency of the advertising, as its content needs to be relevant to a large group of viewers.

So, the personalization of advertising has become increasingly important for end-users, broad-casters and entities that have contracted advertising space. This assumes a special importance in the case of television content.

1.1

Context

Currently, the delimitation of the advertising segments in television broadcasts – i.e., the identifi-cation of the beginning and end moments of a contiguous set of advertising content - is typically done by a human operator. As a result, the process is expensive and potentially prone to errors [CA08]. Also, it contributes to the introduction of delays in the availability of content over other types of medium, since the operator has to wait for the end of the broadcast to edit the resulting video.

There is also a trend and a set of standards and architectures that the industry is following in order to achieve the virtualization of television production. SMPTE 2022-6 [S.M12] is one such standard, and the one architecture is being developed by Joint Task Force on Networked Media (JT-NM), a group formed by the European Broadcasting Union (EBU), Society of Motion Picture and Television Engineers (SMPTE) and Video Services Forum. The purpose of this team and these standards is the transition from physical equipment-based transmissions to IP networks.

(22)

Introduction

This dissertation was developed in a corporate environment at MOG Technologies. MOG is involved in the broadcasting market since 2007 and provides solutions for post-production envi-ronments.

This solution is part of a national project called MOG Cloud Setup, where MOG Technologies partners with INESC TEC with the goal of developing audiovisual content preparation platforms for ingest in the cloud.

1.2

Motivation

By automating the process of identifying and replacing advertising segments in television content it is possible to reduce the delay in content availability in other mediums while also reducing costs and the possibility of errors introduced by human operators.

A television broadcast capable of delivering personalized advertising segments provides an upgraded product to the advertising space buyers, as more specialized segments can have signifi-cantly more impact in the end-user.

The transition from video processing hardware to a cloud-based application can result in lower costs in hardware (processing power, storage) while providing more flexibility, allowing the appli-cation to scale as needed, without major changes to the architecture.

1.3

Goals

The main goal of this dissertation is to develop a cloud-based application prototype capable of receiving and re-sending an audiovisual stream, replacing predetermined segments of such stream with content originated from other sources.

This application should aim to reduce the delay inserted by such operations while maintaining the quality of the content and must be fully automated.

1.4

Document structure

Besides Chapter1, this document contains 5 other chapters.

Chapter2analyzes cloud computing characteristics and models, the role of virtualization in those same models, Hypervisors and Containers and the advantages of a cloud based application.

Chapter3studies the IP protocol, as well as its uses in television production. It also analyzes multiple protocols and standards used for IP broadcasts.

Chapter4analyzes the automation of advertising substitution, describing and comparing so-lutions already developed by television companies. It then proposes an architecture for a cloud based application for real-time advertising substitution and presents the developed prototype.

Chapter5presents the set of validation and acceptance tests that were performed in order to validate the developed prototype, as well as analyzes the results of these tests.

(23)

Introduction

The dissertation ends with chapter6, where a general appreciation of the proposed objectives and future work is made.

(24)

Introduction

(25)

Chapter 2

Cloud Computing

With technologies in processing, storage and network areas progressing rapidly, computing re-sources saw a rise in power while decreasing its price. To take advantage of this phenomenon, research was made in order to accomplish the sharing of physical resources such as CPU and storage among multiple applications, thus arising the Cloud Computing model.

Cloud Computing is a model that provides on-demand network access to computing resources. It allows for a quick provisioning and low managing effort, adapting to the user’s needs and system changes. This model has five main characteristics[MG11]:

• On-demand Self-service:

A cloud model needs to be able to automatically provide computing capabilities to the con-sumer, such as network storage or memory.

• Broad Network Access

It should be available over the network and accessible by any platform, be it computers or mobile devices.

• Resource Pooling

This model can provide services to multiple consumers with different needs, assigning re-sources according their demands.

• Rapid Elasticity:

The model should allow for a rapid scaling, increasing or decreasing its capabilities as de-manded, giving a sense of unlimited resources for the consumer.

• Measured Service

Lastly, the model should monitor resource usage, providing a report to the user containing all the metrics associated with the system.

(26)

Cloud Computing

Figure 2.1: Cloud Model Layer System

The architecture of cloud computing model is structured by layers based on their level of abstraction. So, a cloud layer is classified as of a higher level if its services can be composed from services of the layers below [YBDS08], as represented in Figure2.1.

• The first layer and the least virtualized is the Server layer, composed of the actual hardware. • The Infrastructure layer provides tools to manage virtual machines.

• The Platform layer allows configuration of the resources available, such as CPU, storage and bandwidth, using the services provided by the Infrastructure.

• The Application layer allows the implementation and deployment of cloud applications. It uses services from the Platform layer to improve scalability, increasing or decreasing the resources available as needed.

• The last layer and the least virtualized is the Client Layer. This layer provides the user access to the application, usually through a mobile app or a website.

2.1

Cloud Models

There are multiple models of cloud services, different from each other in its level of abstraction [RCL09]:

• Infrastructure-as-a-Service (IaaS)

In this model, the abstraction is located at the infrastructure level. The users can rent cloud resources depending on their needs, paying only for the actual used resources.

(27)

Cloud Computing

Figure 2.2: Comparison between traditional and cloud models

• Platform-as-a-Service (PaaS)

The abstraction is located at the platform level, with the service provider being responsible for the servers and the user focused only on the application.

• Software-as-a-Service (SaaS)

In this model, a full application is deployed and updated remotely, removing the user’s need to install software on his machine.

As seen in Figure2.2,the user’s control over the lower layers decreseas as the level of abstrac-tion increases (Components managed by the user are represented in blue).

2.2

Private Vs Public Vs Hybrid Clouds

A private cloud infrastructure may be deployed on or off premises and is usually only available to a single organization. The cloud can either be managed by local human resources, placing control and responsibility on the organization or, on the other hand, cloud infrastructure management can be completely outsourced or shared between the organization and a third party. Private clouds provide more infrastructure control and data security but result in higher technical and economical costs.

A public cloud infrastructure is open to the public and managed by a third party institution providing a service for the users. In this model, control, cost and infrastructure are all responsi-bilities of the third party provider. On the other hand, the provider can have access to the user’s data.

A hybrid cloud infrastructure leverages a private cloud and public cloud services by imple-menting a common communication interface able to create interoperability between them. The integration is valuable for companies with sensitive data that should not be exported to third par-ties, but that at the same time intend to integrate business processes with external services.

(28)

Cloud Computing

2.3

Virtualization

The introduction of multicore processors and the integration of virtual technology led to single machines being capable of executing multiple parallel tasks. Using such powerful hardware to run a single application in not sustainable, as the resources would not utilized while the application was not running. On the other hand, using the same operating system to run multiple applications can generate security problems, with no isolation available between applications. There can also be situations where multiple applications try to access the same storage locations, ports or sockets. Virtualization provides a hybrid approach between centralized and decentralized applications, while at the same time promoting horizontal scalability and resource elasticity. Virtualization ab-stracts the hardware and allows multiple guest Operative Systems running in a single host where each guest is completely isolated and running a full OS. This approach brings the following bene-fits:

• The idle time of a system decreases because of the number of virtual machines running simultaneously.

• Resources can be managed and allocated to Virtual Machines individually. • The system can be managed in a decentralized fashion.

• Heterogeneous systems can run within a single host.

• Failover times can be decreased by promoting simplified migration strategies or even live migrations.

• Isolation mitigates guest security issues from a compromised or out of date host. • Green computing is fostered by reducing the number of necessary physical machines.

2.3.1 Hypervisors

The evolution of virtualization revolves around the work on one piece of software, the hypervi-sor, also known as VMM (Virtual Machine Manager). It allows physical devices to share their resources with virtual machines running as guests [Sri09]. In this sense a physical computer can be used to run multiple virtualized instances, each with its own OS and virtual hardware, CPU, memory, network, I/O, all provided by the hypervisor.

The hypervisor also provides the possibility of running guest’s applications without modifying them or their OS. This way, the guests are unaware of whether the environment is virtualized, due to the hypervisor providing the same communication interface as a physical system.

There are several hypervisor categories, namely Type 1 and Type 2. A type 1 hypervisor is implemented in the bare-metal, directily over the hardware, while a Type 2 is installed over the OS, similar to a piece of software. A Type 1 hypervisor is superior to a Type 2 in terms of performance, since the latter has to go through an additional layer, the host OS.

(29)

Cloud Computing

Figure 2.3: Comparison between hypervisors and container engines

There are four main competitors in the hypervisor market, responsible for 93% of its share [PBSL13]. Xen [Bar03] and KVM [Kiv07] are two open source hypervisores based on Linux while VMWare’s ESXi [Cha08] and Hyper-V [VV09] are closed-source solutions.

2.3.2 Containers

Application containers are an alternative to hypervisors, when the overhead introduced by the latter is undesirable. Simply put, containers are a lightweight virtualization technology that enables an application to be packaged along with its virtual environment, configurations and dependencies, isolating it from the deployment environment. As represented in Figure2.3, the main feature that differentiates a container from a hypervisor is the fact that multiple containers can share the same OS.

2.3.3 Docker

Docker is an open-source technology that allows applications to run in containers [Doc]. Using a Dockerfile, a configuration document, Docker can then execute all sorts of instructions, ranging from dependencies installation to configure environment variables. Compared to a virtual ma-chine, Docker containers contain a similar level of isolation while requiring less storage space, resulting in a more efficient solution.

(30)

Cloud Computing

2.4

Advantages of Cloud Computing

On-demand resources enable users to provision or decommission resources such as virtual pro-cessors, memory or storage capacity manually or automatically. This highly adaptable and per-sonalized feature provides the necessary resource elasticity to fulfill a given SLA (Service-Level Agreement) and at the same time promoting cost savings by eliminating unnecessary resources. Resources may be elastically provisioned and released either manually or automatically according to a given resource demand. Also, more is achieved with less hardware, as cloud computing re-sources serve multiple clients in a multi-tenant model where rere-sources are assigned, released and reassigned on demand.

Cloud computing also reduces on-premises hardware and the expert huresources to man-age them leading to easy implementation. Users without technical knowledge are able to run complex applications and grow their businesses.

(31)

Chapter 3

Digital TV Production over IP

The virtualization of mobile production, transitioning from SDI to IP, is an object of discussion and study in the broadcasting industry. By using a single location to broadcast all signals and substituting hardware for software, television companies can significantly reduce broadcasting costs.

3.1

SDI

Serial Digital Interface (SDI) is a family of digital video, audio and metadata interfaces normalized by SMPTE (Society of Motion Picture and Television Engineers). Since the nineties, this interface is used as the standard for television production [Kov13].

SDI is only available in professional equipment and allows the transmission of uncompressed video and audio.

Each time a new format emerges, SDI needs to be adapted to accommodate it. So, each time a new technological breakthrough is made regarding television transmission (4K, 8K, 3D, etc.), all SDI infrastructures need to be adapted [Gol].

3.2

Internet Protocol

TCP/IP is a set of communication protocols used on the Internet and similar networks. This model is composed by five layers, each one responsible for providing services to the upper layer protocol. These layers are, from top to bottom, Application, Transport, Network, Data Link and Physical.

Taking into account the scope of this dissertation, it is worth analyzing the two upper layers, Application and Transport. The Transport layers provides means for communication between different machines, while the application layer allows different processes to communicate among them, using the Transport layer services.

(32)

Digital TV Production over IP

3.3

Transport Layer

UDP (User Datagram Protocol) is a transport layer protocol for network applications based on IP (Internet Protocol). It is a simple protocol, where each packet is sent only once, regardless of it being corrupted or lost. This protocol is mainly used in real-time applications such as video games and video conference applications, prioritizing efficiency above integrity. It is a connectionless protocol.

On the other hand, TCP (Transmission Control Protocol) is a connection-based protocol. Be-fore starting the data transfer, a connection is established between the emitter and the receiver. Each time a packet is received, the receiver send a message to the emitter with a confirmation of the packet reception. This protocol ensures that the data is fully received, in the right order.

Despite the TCP being a more complete protocol, the UDP can be used in cases where the overhead introduced by the connection protocol endangers the viability of the application. Also, in controlled environments such as private networks, the downsides of the UDP are significantly reduced.

There are three methods of communicating via IP. These methods differ in the number of receivers for each transmission:

• Unicast

In a unicast transmission, for each message sent, there can only be one receiver. It is a one-to-one method.

• Broadcast

In a broadcast transmission, the message sent can be received by every node of the network, without exceptions.

• Multicast

Multicast is, similarly to the broadcast, a one-to-many transmission. With this method, the message is only received by interested nodes. To declare interest, a node can send messages asking to join or leave a multicast group. The emitter, instead of sending a message for each receiver, sends one single message to the multicast address. This allows the emitter to send messages to multiple nodes without prior knowledge about their interest.

3.4

MPEG-TS

MPEG-TS [Mpe00] is a media container for content storage and transmission. MPEG-TS streams are composed by one or more program, described in a Program Association Table (PAT). If the stream contains only one program, it can be designated as a Single Program Transport Stream (SPTS) while if it is composed of multiple programs, the stream is defined as a Multiple Program Transport Stream (MPTS) [ETS07].

(33)

Digital TV Production over IP

Figure 3.1: PAT and PMT [Cia09]

A program is a combination of one or more stream of PES (Packetized Elementary Stream) packets. For example, a program can contain an audio PES, a video PES and a subtitle PES. A PMT (Program Map Table) stores the information of all programs present in a MPEG-TS stream, as seen in Figure3.1.

A MPEG-TS stream is a group of TS packets, each with a size of 188 bytes, 4 of them used for the header. This header contains a Packet ID (PID) that allows the identification of its content. Each PES packet contains a PTS (Presentation Timestamp) and a DTS (Decode Timestamp), allowing the synchronization of multiple ES.

A MPEG-TS stream must be encapsulated in a RTP packet to be transported using IP. Each IP packet contains a IP header, a UDP header, a RTP header and the number of TS packets it contains.

3.5

Real-Time Transport Protocol

Real-time Transport Protocol is a protocol used to send audio and video over IP networks. Each RTP packet contains a header with information regarding timestamps, sequence numbers, among others, while leaving a customizable field, allowing the extension of the protocol.

By marking each packet with a sequence number, it is possible to re-order packets regardless of the order in which they where received.

By using RTP allied with UDP, it is possible to maximize the speed of data transfer while keeping the packets ordered.

(34)

Digital TV Production over IP

Real-Time Control Protocol is a protocol used simultaneously with RTP to monitor and pro-vide statistics about the transmission.

3.5.1 RTP Payload Format for Uncompressed Video

RFC 4175 [GP05] specifies the norm for the transport of uncompressed video using RTP. While an RTP packet header only allows the use of 16 bits to identify the packet sequence number, RFC 4175 contains a 16 bit extension called extended sequence number. In a 1-Gbps video stream and sending 1000 byte packets using only RTP, all possible sequence numbers would roll over in 0.5 seconds. This could create problems identifying lost and out-of-order packets. With a 32 bit sequence number, it would take approximately 9 hours for the number to roll over.

3.6

SMPTE 2022-6

SMPTE ST 2022-6 [S.M12] is a standard published by SMPTE, which belongs to the SMPTE ST 2022 family of standards that allows the use of IP technology in the broadcasting industry. The ST 2022-6 defines the transport of SDI over IP using the RTP protocol, so it is dubbed "SDI over IP". By encapsulating SDI payloads into IP packets, the ST 2022-6 allows the SDI signal to be packaged in multiple 1376-byte packets, and it is possible to transmit the content over an Ethernet network, receive the packets, and rebuild the SDI signal. In spite of allowing interoperability with other devices that use SDI, it means that there is no separation between the various streams in SDI [Laa12]. Imagine, for example, that you want to modify the audio that is part of an SDI stream that is transported with the corresponding video. In that case, you will have to deal with the video and all the overhead it will bring to the system when you only want to modify the audio. That is, since there is no separation between the various contents present in the SDI, there is no flexibility in the transport of only part of the content.

3.7

JT-NM Architecture

Joint Task Force on Network Media (JT-NM) is a consortium formed between the European Union, SMPTE (Society of Motion Picture and Television Engineers) and the Video Services Forum. This task-force designed an architecture called JT-NM Reference Architecture v1.0 [oNM15] with the purpose of bringing together good practices, recommendations and frameworks so that there can be interoperability between devices from different manufacturers in the transition from SDI to IP. The JT-NM defines a conceptual model that allows the mapping of the workflows in order to ensure the desired interoperability. Figure3.2represents a simplified architecture, focusing on the scope of this dissertation.

The Network is the heart of the media operations, usually a Ethernet Network.

The Nodes are connected to the Network and provide infrastructure such as storage, processing power and interfaces.

(35)

Digital TV Production over IP

(36)

Digital TV Production over IP

Devices can be software-only services or physical Devices, such as cameras, and are deployed onto the Nodes to provide the Capabilities necessary to complete Tasks.

In the case of cameras or other equipment such as microphones, a Device can be a Source for Essences. These Essences are moved as Grain payloads that are transported over the Network divided in network packets.

A Grain is composed by its media content (video, audio, metadata) and a timestamp. This time represents the instant in which the grain was created and is generated using the Clock present on the Node. All Nodes should should be synchronized, using PTP (Precision Time Protocol) to achieve a nanosecond precision. [Com08]

Essences are sent over the Network by Senders to the Receivers. This communications is made in the form of Flows. A Flow is the result of a Source and there can be multiple Flows for each Source. For example, a Source can generate a Flow for uncompressed video and another one for H.264 compressed video.

The Registry allows connections between Receivers and Senders, by providing the means for Nodes, Devices and Flows to register themselves and discover others, allowing connections to be created between Devices( Receivers and Senders).

In order for this information to be properly articulated and to be used to define workflows, three fundamental blocks are defined on which this data model is based: Timing, Identity and Discovery & Registration.

• Timing

Each Grain contains a timestamp, allowing the consistency of the performed operations and, consequently, ensuring that the Flows are correctly aligned.

• Identity

Each element present in the infrastructure must be easy and uniquely identifiable, so that it can be referenced and used. All relationships between resources must make use of Identity. • Registration and Discovery

Each Node in the Network must register itself and the Devices, Sources, Flows, Senders and Receivers that it makes available, so that other nodes can discover them and obtain the appropriate information about each one.

3.8

NMOS

NMOS (Network Media Workflow Association) is a series of specifications that intends to create frameworks to allow the interoperability desired by the JT-NM architecture. It was created by the Advanced Media Workflow Association, a group composed of several broadcasting corporations and other companies working on the TV market.

The NMOS is based on the conceptual data model proposed by the JT-NM, in order to add identity and relationships between content and equipment. Regardless of the specific task that

(37)

Digital TV Production over IP

Figure 3.3: Node proposed by NMOS [Ass16a]

each Node performs, the logical view of a Node according to the NMOS3.3allows the creation of a level of abstraction sufficient to ensure the modularity and expandability expected that can be adapted to different needs.

NMOS does not limit how each module should work, it only specifies wich interfaces they should expose. Each Node must expose HTTP transactions performed by a REST API. These transactions are described in the "AMWA NMOS Discovery and Registration Specification (IS-04)" specification proposed by NMOS [Ass16b].

(38)

Digital TV Production over IP

(39)

Chapter 4

Automation of Advertising Replacement

4.1

Context

Most TV networks have several contractual obligations with advertisement buyers. These obli-gations range from regional distribution, to time slots and even broadcast services. For example, some ad segments can be played in the television broadcast but not on the web player, others are only relevant to a certain location, losing its efficiency when broadcasted to a larger audience. Also, in order to accommodate Video-on-Demand services, the broadcast needs to be divided by segments, separating advertising blocks from program blocks. In order to fulfill these require-ments, the broadcasters need to allocate physical resources and manpower.

4.2

Market Solutions

In order to address the problem described in4.1, several companies have started research in the areas of detection of advertising content.

4.2.1 Anvato

Anvato Media Content Platform (MCP) is a solution from Anvato [Anv] which offers live stream-ing, video encodstream-ing, cloud editstream-ing, syndication, dynamic ad insertion among other services.

They claim that their stitching technology replaces broadcast ad units with dynamically placed digital ads which are frame accurate in-stream, and using IAB (Interactive Advertising Bureau) standardized VAST (Video Ad Serving Template) tags, they can help to prepare and deliver ads that are both nationally, regionally and locally most relevant to clients’ viewers.

It is designed to monetize both user’s live streams (Figure 4.1) and video-on-demand (Fig-ure4.2) on any screen, any device. It allows the insertion of ads on the server side dynamically on all platforms: desktop, iOS and Android apps, AppleTV, Chromecast, Roku, Amazon Fire TV and others.

This is a solution that delivers the ads in a contextualized way but doesn’t detect the ads when there are no AD triggers.

(40)

Automation of Advertising Replacement

Figure 4.1: Live dynamic server side ad insertion[Anv]

Figure 4.2: VOD dynamic server side ad insertion[Anv]

(41)

Automation of Advertising Replacement

Figure 4.3: Technology of Audible Magic[Mag]

4.2.2 Audible Magic

Audible Magic’s solutions [Mag] use what is called audio fingerprints to match unknown media content against known, registered media content. Analogous to the idea that human fingerprints can be measured to compactly and uniquely identify every person, there are processes that allow very small clips of audio to be measured for distinctive characteristics. These compact audio fingerprint measurements can be uniquely distinguished when compared to measurements taken from any other audio clip.

Audible Magic uses this kind of technology, called automatic content recognition (ACR) tech-nology, to identify unknown media content when fingerprints of that content are matched against known fingerprints registered in an Audible Magic database.

One of the use cases of this technology provided by Audible Magic is their service for TV ad detection and marking, shown in Figure4.3. It allows to detect ads in unmarked broadcast streams and, in real-time, provide frame-accurate timing to trigger the injection of ad markers. With the injection of those ad markers, dynamic ad insertion (DAI) technology can be fully utilized.

Despite the fact that Audible Magic maintains content identification databases for multiple content types, including live TV programming and advertising running on national channels in the USA, it does not include international regions and it does not allow to identify ads outside that database.

4.2.3 ACRCLOUD

ACRCLOUD provides a set of cloud based solutions [Ser] for automatic content recognition, using audio fingerprinting technologies, which are more directed to second screen applications.

(42)

Automation of Advertising Replacement

For example, the live channel detection service allows the collection of live streams from TV or radio stations in real-time and enables the channel to be detected at the exact point of broadcast from any user’s mobile devices. With live content-generated audio fingerprints, supplementary information about the content or interactive campaigns can be triggered to appear on the viewer’s second screen devices.

The usage of this live channel detection service can be summarized as follows:

• Pre-designed contents and interactions are organized by marketing editors in the server end. • Users’ apps identify the TV channel and specific times with audio recognition while they

are watching TV.

• Detailed contents are triggered and retrieved from the server to the users’ app.

It is important to point out that this solution does not allow content replacement or elimination.

4.2.4 Ivitec

Ivitec offers a set of solutions to analyze video clips which are based in adaptative video finger-printing technology [Tec]. This technology is able to adjust the density and granularity of finger-prints to match specific uses cases (Figure4.4) instead of using a single algorithm in the hope that this “average” solution will fit the majority of the use cases (as conventional video fingerprinting approaches do).

The adaptive video fingerprinting is fully implemented into the MediaSeeker Core Platform which is the base of all Ivitec products, and it allows to identify content by comparing to a database of known video information inside the platform. Their solutions are capable of recognizing video clips as they are broadcasted, uploaded or downloaded or within preexisting repositories of video content.

AdMon is a software automated solution for advertisement workflows which promises to streamline and simplify the process for recognition and tracking.

This system analyzes a set of TV channels for advertisements to be automatically recognized and provides valuable detection information such as channel name and detection time within min-utes of airing. It can be integrated with third party capture sources. However, it does not allow content replacement or elimination.

4.2.5 Adobe Primetime

Adobe Primetime is a multiscreen TV platform for live, linear and VOD programming [Pri]. Its modular distribution and monetization capabilities include TVSDK for multiscreen playback, DRM, authentication, dynamic ad insertion (DAI) and audience-centric ad decisioning.

Adobe Primetime ad insertion solution is available in both client and server side configura-tions, and the commercial breaks are identified using traditional broadcast ad break cues, real-time markers, or ad timelines from the publisher’s CMS. The user can even skip or replace burned-in

(43)

Automation of Advertising Replacement

Figure 4.4: - Fingerprint density versus type of search [Tec]

advertisements. Adobe Primetime features turnkey integration with Adobe’s video ad-decisioning solution to deliver true DAI into live, linear, and video-on-demand content across desktops, mobile devices, gaming consoles, and IP-enabled set-top boxes. Adobe Prime time ad insertion can also be integrated with third-party ad servers and sell-side platforms.

4.2.6 Comparative Analysis

Table4.1summarizes each solution, comparing them by available features. These features include the capacity of outputting a cloud-ready format. Another important functionality is the ability to ingest live programs or offline media, in which case it must be possible to interact with several file systems,memory card readers, temporary storage systems, or virtual directories. In terms of advertising, the different solutions are compared by their capacity to detect advertising segments, be it on or off the cloud and also by the capability of replacing said segments.

As seen in Table4.1, there isn’t one solution containing all the features analyzed. The solutions that are closest to that goal are Anvato and Audible Magic products. In Anvato’s case, there is no solution for cloud based advertising detection. In the case of Audible Magic product, the ingest of offline content is not possible.

(44)

Automation of Advertising Replacement

Table 4.1: Comparative analysis by features

Solution Cloud ready output formats

Ingest Advertising

Detection Content replacement Live File based Cloud

Off Cloud Based Anvato X X X X X Audible Magic X X X X X ACRCLOUD X Ivitec X Adobe Primetime X X

4.3

Requirements

To ensure a solution to the automation of advertising substitution, some requirements need to be met:

• The application must be able to receive one or more input feeds from outside the cloud. The provided bandwith must also be large enough to ensure the transmission of uncompressed video between modules.

• Internet Group Management Protocol (IGMP) must be active, to allow multicast communi-cation in the network.

• The application is to be used in real-time scenarios. To ensure that there are no delays during video processing, the cloud components must be able to scale vertically, so that the time to process a frame is lower than the video frame rate.

• All components must be deployed in the same private cloud, in order to reduce the delay in the communication between each component and ensure network reliability.

4.4

Proposal

This dissertation proposes a cloud-based application capable of analizing a livestream, identifying ad segments it contains in real time and replacing them with more relevant content, using IP for video transportation.

4.4.1 Architecture

This application is composed by several modules, each containing a unique function and designed in a way that allows changes in the architecture to accommodate different use cases. Some of this modules may also be instantiated multiple times, depending on the number of consumers and detection algorithms utilized. These modules interact with each other as seen in Figure4.5. Since

(45)

Automation of Advertising Replacement

Figure 4.5: Cloud Application Architecture

this is a time-critical application, it is recommended that all these modules be deployed in the same private cloud network.

4.4.1.1 Input Distributor

The Input Distributor is the module responsible for the reception and processing of a live stream originated from outside the cloud, making it available to the other modules of the application.

It receives an audiovisual stream, decodes it and sends it via multicast. This will be the primary feed, and the one that is played out by default by the application. This feed will be then used by both the Video Switcher and Advertisement Detector modules.

Figure4.6shows the media flows of this module.

4.4.1.2 VOD Replacement

Similarly to the Input Distributor, the VOD Replacement module processes and sends video and audio via multicast but, instead of receiving content from a live stream, it reads from a file.

(46)

Automation of Advertising Replacement

Figure 4.7: VOD Replacement media flows

This node also contains a REST server, allowing it to receive requests to restart the video being played. This will be the alternate feed, only played out when an advertisement segment is detected in the primary one. The output of this module will be accessed by the Video Switcher module.

Figure4.7shows the media flows of this module.

4.4.1.3 Video Switcher

Video Switcher is the module responsible for selecting the feed that will be played out. It sub-scribes to two multicast addresses, the first one where the Input Distributor resulting stream is available and the second one from the VOD Replacement module, and selects one of them, redi-recting it to another multicast address, which will be accessed by the Output module.

The selection is made according to information received from Business Logic. In order to receive this information, a REST Server is present in this module, capable of receiving requests containing information about switching instances.

To ensure that the VOD contents are played from the very start when an advertising segment is detected, the Video Switcher can use a REST API to send a request to replay the video from start to the VOD Replacement. Moments before commuting to the alternate feed, the module sends the request to the VOD Replacement and starts scanning the contents of the stream in search of its frame 0to then store its data in memory. This way, when the switch actually happens, the first frame sent by this module will equal the first frame from the original video. This ensures no data is lost during the feed change.

Figure4.8shows the media flows of this module.

Figure 4.8: Video Switcher media flows

(47)

Automation of Advertising Replacement

Figure 4.9: Business Logic media flows

4.4.1.4 Business Logic

This module receives EDL (Edit Decision List) files from the Advertisement Detectors. This file lists the frame numbers where the advertisement segments start and finish.

This information is sent to the Video Switcher with the help of a REST API. Figure4.9shows the media flows of this module.

4.4.1.5 Advertisement Detector

Subscribes to the Input Distributor multicast address and scans its contents, detecting advertise-ment segadvertise-ments and saving the frames corresponding to its start and finish. It then sends the result to the Business Logic in the form of an EDL.

There can be multiple instances of this module, one for each detection algorithm used. For example, one module can analyze the video stream, while the other focuses on the audio.

Figure4.10shows the media flows of this module.

4.4.1.6 Output

Subscribes to the multicast address containing the output from the Video Switcher. It then encodes the stream and sends it using RTP to the end-user.

The output of this module is sent by unicast so, for each end-user, a different instantiation of the module is needed. Also, one module is needed for each encode format. So, if the stream needs to be sent to two different users with two different encoding formats for each, four instantiations of this module are needed.

Figure4.11shows the media flows of this module.

(48)

Automation of Advertising Replacement

Figure 4.11: Output media flows

4.4.2 Modularity and Scalability

Each module is independent from all the others. With this level of modularity, it is possible to develop and integrate other modules on this application, providing a larger range of use cases. For example, a module can be developed that stores the resulting stream of the application, or even an ad-free stream, as a video file, as seen in Figure4.12. It is also possible to replace the VOD Replacement with another Input Distributor, or vice-versa.

The application is also capable of scaling up, allowing multiple instances of each module according to the user needs. In this case, the prototype developed is optimized to receive two video sources, the main one coming from a live stream and the alternative coming from a video file. However, it is possible to increase this number, doing so by deploying several Video Switcher modules, as seen in the figure 4.13, to accommodate to the use case. So, each additional feed requires an additional Input Distributor and Video Switcher.

The end result of the application is a MPEG-TS stream equivalent to the one received by the Input Distributor. This allows multiple instances of the application in cascade, where the output feed of the first is used as an input for the following one. The consumer can then apply video transformations, such as overlays, if needed.

4.5

Prototype

A prototype was implemented in order to validate the architecture described in4.4.1and analyze the value of future products based on it.

The prototype is able to receive two sources of audiovisual content, one from a live stream and the other from a video file, decode and send the video between modules using RTP. It is also capable of parsing an Edit Decision List to find instances of advertising segments. In the end, the video is encoded and sent to the end-user.

The end result is a MPEG-TS stream in which the content abides by the switching instances defined in the EDL.

In order to implement the prototype, the modules Input Distributor, VODReplacement, Video Switcher, Output and Business Logic were developed. To simulate the Ad Detector module, an EDL was created describing the time instances in which the feeds should be commuted (represent-ing advertis(represent-ing segments), as seen in List(represent-ing4.1.

(49)

Automation of Advertising Replacement

Figure 4.12: VoD Storage use case

(50)

Automation of Advertising Replacement 1 <P u b P l u g i n> 2 <S e t t i n g s> 3 <Type v i d e o =" t r u e " a u d i o =" f a l s e "/ > 4 < A l g o r i t h m s > 5 < V i d e o name=" v i s u a l r h y t h m " / > 6 < / A l g o r i t h m s > 7 < /S e t t i n g s> 8 <R e p o r t> 9 <G l o b a l n P r o c e s s e d F r a m e s =" 2000 " n E v e n t s =" 15 " n P u b F r a m e s =" 708 "/ > 10 <E v e n t s> 11 <P u b S e q u e n c e i n i t i a l F r a m e =" 481 " f i n a l F r a m e =" 578 " s t a r t P t s =" 4 8 1 4 8 1 / 3 0 0 0 0 " e n d P t s =" 2 8 9 2 8 9 / 1 5 0 0 0 " / > 12 <P u b S e q u e n c e i n i t i a l F r a m e =" 687 " f i n a l F r a m e =" 797 " s t a r t P t s =" 2 2 9 2 2 9 / 1 0 0 0 0 " e n d P t s =" 7 9 7 7 9 7 / 3 0 0 0 0 " / > 13 <P u b S e q u e n c e i n i t i a l F r a m e =" 886 " f i n a l F r a m e =" 1042 " s t a r t P t s =" 4 4 3 4 4 3 / 1 5 0 0 0 " e n d P t s =" 5 2 1 5 2 1 / 1 5 0 0 0 " / > 14 <P u b S e q u e n c e i n i t i a l F r a m e =" 1061 " f i n a l F r a m e =" 1064 " s t a r t P t s =" 1 0 6 2 0 6 1 / 3 0 0 0 0 " e n d P t s =" 1 3 3 1 3 3 / 3 7 5 0 " / > 15 <P u b S e q u e n c e i n i t i a l F r a m e =" 1090 " f i n a l F r a m e =" 1105 " s t a r t P t s =" 1 0 9 1 0 9 / 3 0 0 0 " e n d P t s =" 2 2 1 2 2 1 / 6 0 0 0 " / > 16 <P u b S e q u e n c e i n i t i a l F r a m e =" 1107 " f i n a l F r a m e =" 1115 " s t a r t P t s =" 3 6 9 3 6 9 / 1 0 0 0 0 " e n d P t s =" 2 2 3 2 2 3 / 6 0 0 0 " / > 17 <P u b S e q u e n c e i n i t i a l F r a m e =" 1120 " f i n a l F r a m e =" 1122 " s t a r t P t s =" 1 4 0 1 4 / 3 7 5 " e n d P t s =" 1 8 7 1 8 7 / 5 0 0 0 " / > 18 <P u b S e q u e n c e i n i t i a l F r a m e =" 1124 " f i n a l F r a m e =" 1128 " s t a r t P t s =" 2 8 1 2 8 1 / 7 5 0 0 " e n d P t s =" 4 7 0 4 7 / 1 2 5 0 " / > 19 <P u b S e q u e n c e i n i t i a l F r a m e =" 1130 " f i n a l F r a m e =" 1134 " s t a r t P t s =" 1 1 3 1 1 3 / 3 0 0 0 " e n d P t s =" 1 8 9 1 8 9 / 5 0 0 0 " / > 20 <P u b S e q u e n c e i n i t i a l F r a m e =" 1142 " f i n a l F r a m e =" 1176 " s t a r t P t s =" 5 7 1 5 7 1 / 1 5 0 0 0 " e n d P t s =" 4 9 0 4 9 / 1 2 5 0 " / > 21 <P u b S e q u e n c e i n i t i a l F r a m e =" 1180 " f i n a l F r a m e =" 1249 " s t a r t P t s =" 5 9 0 5 9 / 1 5 0 0 " e n d P t s =" 1 2 5 0 2 4 9 / 3 0 0 0 0 " / > 22 <P u b S e q u e n c e i n i t i a l F r a m e =" 1256 " f i n a l F r a m e =" 1288 " s t a r t P t s =" 1 5 7 1 5 7 / 3 7 5 0 " e n d P t s =" 1 6 1 1 6 1 / 3 7 5 0 " / > 23 <P u b S e q u e n c e i n i t i a l F r a m e =" 1290 " f i n a l F r a m e =" 1320 " s t a r t P t s =" 4 3 0 4 3 / 1 0 0 0 " e n d P t s =" 1 1 0 1 1 / 2 5 0 " / > 24 <P u b S e q u e n c e i n i t i a l F r a m e =" 1328 " f i n a l F r a m e =" 1355 " s t a r t P t s =" 8 3 0 8 3 / 1 8 7 5 " e n d P t s =" 2 7 1 2 7 1 / 6 0 0 0 " / > 25 <P u b S e q u e n c e i n i t i a l F r a m e =" 1365 " f i n a l F r a m e =" 1467 " s t a r t P t s =" 9 1 0 9 1 / 2 0 0 0 " e n d P t s =" 4 8 9 4 8 9 / 1 0 0 0 0 " / > 26 < /E v e n t s> 27 < /R e p o r t> 28 < /P u b P l u g i n>

Listing 4.1: Ad Detector Report

4.5.1 Prototype limitations

Since the prototype is just a proof of concept, some restrictions were defined for the development: • Input:The prototype is only capable of receiving one MPEG-TS stream and one video file

as input.

• Output:The output feed must be a MPEG-TS stream, similar to the input stream. • Environment:The prototype only works in a Windows environment.

• The audio component of the input feeds is discarded after the demux operation. The output feed has no audio component.

4.5.2 MOG MPL

MOG Technologies Media Processing Library is used in the development of its products and is private property of MOG Technologies. It is mainly used for broadcast solutions and provides optimized functions and methods for video processing and transmission. This allows the develop-ment of real-time applications with minimum delay. By using this application, it is also assured compatibility of this project with other MOG products.

4.5.3 Implementation

From the architecture defined in Section4.4.1, all modules described where implemented, with the exception of the Ad Detector module (Figure4.14). Instead, a file was generated, simulating

(51)

Automation of Advertising Replacement

Figure 4.14: Prototype Developed

a possible output of such component, with information regarding the instance where an advertise-ment segadvertise-ment starts and ends, as seen in Listing4.1. This file can be accessed by the Business Logic.

By developing this prototype it is possible to:

• Test sending and receiving high definition uncompressed video in a cloud environment. • Test the capabilities of demux, decode, mux and encode operations in a cloud environment. • Analyze the switching capabilities of the application, in terms of frame accuracy and delay. • Analyze metrics such as bandwidth, RAM usage and processing capacity.

In order to keep the modularity of the application, each component of the prototype is devel-oped as a Docker container based on a Windows Server Core image.

4.5.3.1 Input Distributor

In the Input Distributor module, a stream compliant with the RFC 2250 norm is received. Then, each RTP packet is demuxed, separating the video from the audio. The video packets are then decoded and stored in frames. It is possible to define a number of frames to be stored in memory as a buffer, in order to ensure the quality of the stream, although this results in a large amount of RAM usage, as the video is stored uncompressed.

Each frame is then assigned a decoding timestamp, starting in 0 and incrementing by 1/fram-erate, in this case, 1/50. To send the video to the other modules, the frames must be packetized in RTP packets. Each packet header contains the new decoding timestamp, created by the Input Distributor.

(52)

Automation of Advertising Replacement

4.5.3.2 VOD Replacement

The VoD Replacement module is fed by a video file. For each frame of the video, the module demuxes, separating audio from video, decodes the video and stores it in new frames. These frames can also be stored in local memory to reduce jittery in the broadcast.

Then, it re-packets the uncompressed video in RTP packets and sends them to a multicast address, maintaining the original video frame rate. The multicast address is the same as the Input Distributor, but with a different port.

This component also contains a REST server. With this server, the Video Switcher module can request the VoD Replacement module to restart the video streaming from the initial frame, usually before the beginning of and advertisement segment present in the main input feed, originated from the Input Distributor.

4.5.3.3 Business Logic

First, this module reads and parses an Edit List Decision file provided by the user (Listing4.1). Then it sends the initial and final frame of each ad segment to the Video Switcher using a REST service.

4.5.3.4 Video Switcher

Video Switcher contains a REST server. This server is used to receive the frame numbers that correspond to an ad segment start or ad segment end from the Business Logic.

This module also subscribes to the multicast address of both Input Distributor and VoD Re-placement feeds. By default, only the primary feed, originated from the Input Distributor, is re-ceived. The module then depacketizes the packets, grouping them in frames, keeping the original timestamps.

Using the information gathered from the Business Logic, this component analyzes the current frame timestamp and acts accordingly:

1. If the difference between the next advertisement segment start timestamp and the current frame timestamp is 2 seconds, the Video Switcher starts preparing the commutation by send a request to the VoD Replacement to restart the video streaming. Then, in the following frames, in addition to receiving and handling the packets received from the Input Distributor, it also analyzes the ones from VoD Replacement, searching for a frame with a timestamp of 0.

When this frame is found, its contents are stored in local memory, so that when the feed switching occurs, the first frame sent is the first frame of the video.

2. If the timestamp of the frame received is the same as the next advertising segment start frame, the Video Switcher stops processing the Input Distributor packets to instead handling the VoD Replacement ones.

(53)

Automation of Advertising Replacement

The packets received are stored in the memory buffer mentioned above as a frame and the module then packetizes and sends the first frame from the buffer to another multicast address. This address may be subscribed by the Output module.

3. If the timestamp of the frame received is the same as the next advertising segment end frame, the module stops the processing of the secondary feed, the VoD Replacement one, to instead process the main feed.

It also clears the buffer containing the video frames, in order to prevent a memory leak. 4. When none of the above conditions are met, the Video Switcher continues processing the

active feed, repacketing the received frames and sending them via multicast. 4.5.3.5 Output

The Output module performs the inverted operations of the Input Distributor. It subscribes to the multicast address where the Video Switcher packets were sent, grouping them into frames. Then encodes them to H.264 format, sending the result as an MPEG-TS stream via RTP.

Modules Similarities

All modules containing video processing operations (Input Distributor, VoD Replacement and Output) share multiple functions, such as receiving RTP packets and sending RTP packets.

These modules can also contain a customizable buffer. With this buffer, the module can store a number of frames in local memory before beginning its sending functions. This operations can minimize the problems introduced by network failures.

(54)

Automation of Advertising Replacement

(55)

Chapter 5

Results and analysis

5.1

Test Methodologies

During the testing phase, a scenario was established composed of a client PC and a Host Cloud. The PC is used to generate a MPEG-TS stream and send it to the Cloud using RTP. This is ac-complished with the help of the command line interface of FFMPEG [FFM]. The Host Cloud is composed by two identical physical servers.

By using Docker Engine on the Host, it is possible approximate the scenario to a virtualized environment, where each container has no information about the physical location of any other. The characteristics of the test machines are described in table5.1.

Table 5.1: Technical Specification of the test machines

Client Host

Server 1 Server 2 CPU Intel Core I7-4770

@ 3.4GHz

Intel Core I7-4790S @ 3.2GHz

Intel Core I7-4790S @ 3.2GHz

RAM 16 GB 16 GB 16 GB

Network Intel Ethernet I217-LM Intel X550 10 Gbit Intel X550 10 Gbit Operating System Windows 10 Enterprise Windows Server Windows Server

In terms of network specifications, IGMP and multicast are active and the Maximum Trans-mission Unit is 1500 bytes.

The characteristics of the video utilized during the tests is described in Table5.2. Video #1 was used as the Input Distributor feed while Video #2 was streamed by VOD Replacement. Both videos were played on loop.

To balance the load of the two available servers, the distribution of the containers was made according to Figure5.1. Figure5.1also shows the flow of data between each component of the application, including the client.

(56)

Results and analysis

Table 5.2: Video Specifications Video #1 Video #2 File Size 550 MB 2.19 GB File Format MPEG-TS MXF

Duration 39s 5m 56s

Color Space 4:2:2 4:2:2 Color Depth 8 bits 8 bits Scan method Progressive Progressive Resolution 1280x720 1280x720 Frame rate 50 fps 50 fps

5.2

Results

In order to analyze the performance of the developed prototype, some metrics were monitored and analyzed, such as:

• Frame Rate • Bandwidth • RAM usage • CPU usage

To test the flexibility of the application and the quality and format of the resulting video, a last test was made where the input stream of an application instantiation was originated from the output stream of another application instance.

5.2.1 Frame Rate

Table5.3shows the average frame rate in each module. Since the original video has a frame rate of 50 frames per second, the application should maintain this frame rate in all its video processing modules (Input Distributor, VOD Replacement, Video Switcher and Output). A below expected frame rate can mean a lack of processing power, which means that the inbound traffic is higher than the outbound, leading to video errors and memory leaks.

Table 5.3: Frame Rate in each module Frame Rate Input Distributor 50 FPS VOD Replacement 50 FPS Video Switcher 50 FPS Output 50 FPS 36

(57)

Results and analysis

Figure 5.1: Application deploy scenario and data flow

5.2.2 Bandwidth

In Table5.4 is detailed the bandwidth used by each module, divided by inbound and outbound traffic. Since the video feed sent from the Client to the Input Distributor is in the form of com-pressed video, the inbound traffic in the module is not constant, varying based on the detail of each frame and consequent rate of compression.

Table 5.4: Average Inbound and Outbound traffic in each module

Module Inbound Outbound

Input Distributor 14 Mbps 771 Mbps VOD Replacement - 771 Mbps Video Switcher 1.5 Gbps 771 Mbps Output 771 Mpbs 14 Mbps Business Logic - -5.2.3 RAM Usage

The RAM consumed by each module can be observed in Table5.5. For this metric, an additional test was performed. The scenario of this second test is similar to the first one, with only different being in the fact that each module has a frame buffer of 50 frames (1 second). This means that before sending data, the module stores a full second worth of video.

It’s important to mention that, in situations where the processing power available is insufficient to fulfill the video processing requirements, the outbound traffic will be lower than the inbound, causing the memory buffer to increase, leading to a possible memory leak.

(58)

Results and analysis

Table 5.5: Average Module RAM Usage

Module Test #1 Test #2

Input Distributor 40.3 MB 127.2 MB VOD Replacement 37,7 MB 125.8 MB Video Switcher 73.6 MB 161.1 MB Output 249,1 MB 332.3 MB Business Logic 2.3 MB 2.3 MB 5.2.4 CPU Usage

Table5.6shows the CPU usage of each module. As expected, the most demanding containers are the Input Distributor, VOD Replacement and Output, followed by the Video Switcher, since video processing operations require a large amount of computing power.

Table 5.6: Average Module CPU Usage

Module Test #1 Input Distributor 7.1% VOD Replacement 4.6% Video Switcher 6.2% Output 18.2% Business Logic 0.2% 5.2.5 Cascade Scenario

The goal of this test is to prove that the output of the application (MPEG-TS stream generated by the Output module) can be used as an input to another instance of the same application (to be read by the Input Distributor module).

Figure5.2shows the scenario instantiated for the final test. As the main goal of the test was to verify input and output streams of the application, only an Input Distributor and an Output module were deployed for each application.

The result was a success, with the second instance of the application being capable of reading and outputting the original stream originated from the first application.

The network throughput was analyzed in order to find differences between inbound and out-bound traffic, but the results were similar to Figure5.4, as expected.

5.2.6 Results

By analyzing the frame rate and bandwidth, it is possible to conclude that there are no losses in information during each of the modules. All modules are capable of running at 50 frames per second, the rate of the original video. In the modules where no change to the video format was made, the bandwidth was constant and the expected for an uncompressed video.

(59)

Results and analysis

Figure 5.2: Cascade Test Scenario

In terms of memory, all modules where capable of maintaining a constant, low RAM usage. The Output module experienced a higher usage rate compared to the other modules, which sug-gests that it can be optimized.

When comparing processing power needed, the results were very similar to the RAM usage test. The Output module performed the worst, depleting almost 20% of the CPU provided. All other modules had a low CPU usage.

The final result of the application (MPEG-TS stream generated by the Output module) was compatible with the Input Distributor, proving that multiple instances of the same application can be deployed in sequence, with the output stream of the first serving as an input stream for the second.

(60)

Results and analysis

(61)

Chapter 6

Conclusions and Future Work

By automating the process of content substitution, it is possible to provide the end-user with a real-time, personalized content without the need to reduce the range of the broadcast.

With the use of cloud services, it is possible to reduce costs in broadcast operations, using software for video processing, such as transcoding and video switching.

This dissertation proposed an architecture and developed a prototype for a distributed appli-cation capable of identifying and replacing advertising segments in a livestream, providing the end-user with a personalized, more effective advertising, without changing the actual content of the broadcast. It also suggested other use cases in which this application can be used, due to its high modularity.

6.1

Fulfillment of Goals

The main goals proposed for this dissertation where achieved. The proposed architecture allows the development of other applications for TV production by adding new modules with new func-tions.

The developed prototype was successful in testing the proposed architecture, proving that it is possible to use the cloud for television operations, such as advertisement substitution, maintaining the quality of the original broadcast, while providing a better advertising experience and providing the content in real-time.

6.2

Future Work

The proposed architecture provides a starting point to future experimentation based on IP TV, while being flexible enough to allow other applications to be developed using an extended ver-sion of the architecture. This can lead to the transition of televiver-sion production operations from hardware to software.

(62)

Conclusions and Future Work

The Output module can be optimized to reduce the computing load used during the decoding operation, reducing its computing load to values similar to the other modules.

The time expended in frame processing can also be reduced, providing the means necessary to process higher resolution videos.

A new module can be developed to allow each node to register to the system and discover other nodes, creating an environment where all modules can communicate with each other without prior information regarding their locations.

Since all modules are independent, it is possible to develop new applications with various use cases, using the developed modules as a starting point. It is also possible to expand on the prototype, creating new modules, adapting to the complexity of the system.

Referências

Documentos relacionados

6.20 Comparison of the average video quality provided by each scheduling algorithm under evaluation, considering 50 video application users in the

Objetivos: Identificar os conhecimentos dos pais e dos elementos de referência da escola (infantário e pré-escola), na gestão do regime terapêutico com sistema de perfusão

Este artigo apresenta-se como um estudo de caso que se propõe a utilizar os alicerces teóricos da pedagogia crítica, da educação profissional tecnológica e do contexto

Não por mera coincidência essa relação originária será colocada em obra pela linguagem poética, aquela mesma que se mostra a Heidegger como a morada essencial do ser

A idéia da fabricação de identidades também é imposta pela escola, dentro dela é ensinado aos alunos o modo como devem se portar, como relacionarem-se com os demais,

Insisto que para os sempre apaixonados, é essa mesma a dinâmica do Amor: ter alguém que possa nos suprir nas nossas deficiências, nos tornar grandes, fortes, e nos devolver o

The second idea is that the &#34;labour market&#34; can be seen as an area of socialisation and, in this way, the transition is not the direct result either of the specificity of