Artigo submetido ao Elsevier, Visual Communication and

QoE Assessment in Scalable Video Distribution Over Peer-to-

Peer Networks

Valmiro José Rangel Galvis, Paulo Roberto de Lira Gondim Faculdade de Tecnologia - Universidade de Brasília – UnB

Brasília – D.F. - Brasil.

{valmiro, pgondim}@unb.br

Abstract. Scalable video has been considered a promising option for promoting adaptation to the highly variable conditions of bandwidth in video distribution systems, and peer-to-peer (P2P) networks have received crescent attention due to the possibility of solving the bottlenecks found in Client/Server architectures. This paper presents a scheme to estimate the quality of experience (QoE) as perceived by a user receiving a scalable video distributed over a P2P network. The scheme is based on a quantitative relationship between the measured quality of service (QoS) using network parameters and the quality as perceived by the end-user. According to the experiments and measures, the relationship predicts the Mean Opinion Score (MOS) accurately.

Keywords: QoE, QoS, H.264/SVC, MOS, P2P

1. Introduction

The distribution of video is one of the most relevant applications for the Internet, arousing great interest by the users, while involving requirements difficult to be met. Among such requirements, are included the high bandwidth commonly required and the need for traffic handling with time constraints (real time), subject to conditions and variability of network resources.

The use of peer-to-peer (P2P) networks for video streaming on the Internet has received considerable attention from researchers (e.g., [6, 18, 20]), allowing personal computers to function in a coordinated manner, as a distributed storage medium by contributing, searching, and obtaining digital content [15]. Thus, P2P architectures allow the sharing of video chunks through a direct exchange between users of the network, instead of requiring the intermediation or support of a server or a centralized entity, allowing achieve scalability of bandwidth resources, path redundancy and network capacity for self-organization.

149

If on one hand the architectures of P2P networks allow to solve many existing problems in client-server architectures, video streaming on P2P networks still presents challenges, considering the growing demand, limited upload capacity of peers, the heterogeneity of receivers and, in particular, large variations in the availability of bandwidth.

To address the aforementioned bandwidth variability inherent to P2P systems, transmission and coding technologies must support adaptation of transmission rates, according to the available bandwidth. Moreover, if traditional video coding technologies (such as MPEG2 or, even more recently, H.264/AVC) is used, it becomes difficult to handle different types and capacities of receivers, ranging from mobile phones to high-definition displays (LCD televisions, for example).

In this sense, the treatment of heterogeneous nodes and different bitrates within a peer- to-peer network can be benefited by the use of scalable video, where has been considered the standard H.264/SVC (Scalable Video Coding), as an extension of the standard H.264/AVC (Advanced Video Coding).

The said extension has emerged as a promising option for allowing a user to receive part of the stream according to their processing capacity, resolution of the receiver, and capabilities of the network infrastructure on which it stands. Since the scalable video is encoded in layers, the user can receive the base layer and start to play the video; on the other hand, if the network and the user have available resources to treat other layers, called enhancement layers, it is possible to improve the quality perceived.

In this context, one of the problems that has been the subject of recent researchs [1, 4, 7, 30] is the mapping between the quality of service (QoS) offered by the network and the quality of experience (QoE) perceived by the user. The need for such mapping is justified for several reasons, among them the difficulty of estimating, in an automated way and in real-time, the experience that is being lived by the end user. More than that, it is necessary to assess the impact of QoS parameters on video quality (QoV) and therefore on the QoE, and seek to establish the necessary adjustments in terms of sending rate of video sequence (or layers of the same), consistent with the capabilities of the receivers and the availability of network resources.

So, we are interested in the assessment of changes in perceived quality depending on changes in network and application parameters. That is, find a relationship between the network and application parameters and quality delivered to the end user, which allows

150

to predict and to quantify the value of perceived quality for, from this relationship, to develop a technique for adaptive streaming of the content or of the transmission rates. Moreover, the lack of adequate scalable video encoders to work with live streaming made us to choose to study a system based on video on demand (VoD). Thus, this paper evaluates the impact of network and application parameters in the quality perceived by the end user in a video on demand distribution over a P2P network, with the video being encoded in the standard H.264/SVC.

The organization of this paper is as follows: section 2 presents aspects related to scalable video and P2P networks; in section 3 are discussed related work; section 4 discusses the experiments performed and the relationship between parameters used for QoS and QoE, based in nonlinear regression. Section 5, finally, outlines the conclusions and future work.

2. Scalable Video Coding and P2P Networks

This section aims to present basic concepts about scalable video and P2P networks. 2.1. Scalable Video Coding

The scalability of video is provided based on a structure of layered video coding, from which a base layer can be increased by one or more layers of refinement or

improvement (enhancement layers). Among the scalable video standards, there is the standard H.264/SVC (Scalable Video Coding), which is an extension of H.264/AVC (Advanced Video Coding). The most common types of video scalability are the temporal scalability, the spatial scalability, and the scalability SNR (Signal-to-Noise Ratio) or quality scalability, as described below [10]:

 Temporal scalability: the frame rate of the enhancement layers is larger compared with the frame rates of the lower layers, including larger than the frame rate of the base layer.

 Spatial Scalability: the enhancement layers have a resolution equal or greater than the base layer or if compared to other lower layers: the spatial scalability is the possibility to transmit images with different resolutions.

 Quality Scalability (SNR): layers with spatial and temporal resolution remain the same and only the layers of quality are enhanced. The higher the SNR, the higher the quality of the image produced.

For coding the videos used in this article, we used the medium granularity scalability (MGS) as well as the temporal and spatial scalability combined.

151

Some advantages that scalable video coding offers, compared with non-scalable video are listed below:

 In the case of simulcast, multiple versions of the same video will be available to be served to different users with different processing, bandwidth and storage capabilities. Obviously, these different versions have a higher degree of redundancy, and even if they have different bitrates all streams represent the same content. The scalable video coding strives to reduce this redundancy and, therefore, may produce a video stream that requires significantly less storage space than the sum of all versions of a simulcast video stream.

 Scalable video streams can be encoded with different bitrates. So, you can increase the options of choosing a bitrate from a broad set of possible values.  Management of different versions of the same video bitrate is avoided. With the

scalable video one bitstream can serve to multiple users with different needs. As a result the adjustment of the bitrate is simplified. To adjust there is no need to swap between two separate bitstreams, because this operation can be done using the same video stream. This convenience improves the flexibility of the video stream and increases the system robustness against transmission link failures.  Encoding the video with a layered structure allows to associate different degrees

of priority to each layer. The base layer of a scalable video stream contains information that is essential for the playback of the video. Thus, the base layer is the most important part of the video stream. So, the higher the order of the enhancement layer the lower its priority.

The advantage of this layered structure is that specific parts of the video can be prioritized. In a network with limited bandwidth that prioritization allows to prefer data packets that are essential for video playback. Therefore, in cases where there is not enough bandwidth to receive the complete video, you can offer at least one low quality version of the video.

 Especially for P2P networks, scalable video coding offers another advantage that is related to the advantages mentioned above. Generally, all peers in a network do not form a homogeneous group, differing by the available bandwidth or the computing capabilities. Thus, in the case of the simulcast, the video would require different bitrates and therefore also need different streams. This would

152

lead to the problem of having to break the overlay in different subgroups where each subgroup shares only one version of the video stream [12, 13].

2.2. VoD over Peer-to-Peer Networks

P2P networks have evolved as a promising paradigm for distributing video in large scale, in a manner more efficient than the traditional architectures [1]. Due to overload in the server side in systems based in the unicast architecture Client/Server, where you experience bottlenecks when the number of clients increases in the network, having losses in bandwidth and in the speed of the service, P2P networks arise as an alternative to relieve the load on the server in content delivery systems.

We can see, as pointed out in [33], that most of the video streaming systems over P2P networks (e.g., PPLive, SopCast, TV Ants, UUSee, CoolStreaming) do not use video scalability yet (thus serving only one version of the video stream for all pairs), and therefore have limited support for heterogeneous pairs.

Thus, it is expected that the SVC codification, if used in systems P2PVoD (peer-to-peer video-on-demand), will be capable of providing an efficient adaptation to heterogeneous resources and dynamic behavior of the network, and allows participation of the peers in content distribution in large scale [1].

3. Related Work

Two categories are considered here. The first one, talking specifically about the use of scalable vídeo over P2P networks, and, the second one, talking about issues related to the mapping of QoS on QoE.In these two categories, we can observe the lack of work involving mapping QoS on QoE in P2P networks using scalable video, which shows an aspect of originality in the proposal presented here.

Within the first category, in [18], the authors use SVC to ensure smooth delivery of the video content between peers of the network choosing pairs that contribute, with different parts of the video, according to the offered QoS. The video is encoded to provide only temporal scalability (7.5, 15 and 30fps), has a QCIF resolution (176x144) and only one option of SNR scalability. The most important layer of the video (the base layer) is obtained from the pair with the best QoS offered. The QoS offered is calculated using the bandwidth that the pair could contribute during the video transmission. The bandwidth available between the transmitter and receiver is estimated by the RTT (Round Trip Time). Simulations for test the performance of the system where made with the quality of a scalable video streaming sent over a peer-to-peer network with a

153

similar topology of a Gnutella network [21]. There was an improvement in the throughput rate of the video in the pairs and a better video quality received. The assessment of the quality was made using the PSNR (Peak Signal-to-Noise Ratio) metric.

In [19], the authors show the advantages of the use of scalable video coding with a P2P network, and propose an adaptive streaming over a P2P network based on the topology of BitTorrent [22], for heterogeneous networks. The quality is measured using the PSNR, being shown, by simulations, the adjustment of the bitstream with respect to networks of different bandwidths subject to fluctuations.

Nunes et al. [25] implement a prototype for the distribution of scalable video using a prioritized sliding window algorithm choosing the proper parts that are already stored in each pair and reserve a download window with the size of a chunk that prioritizes the base layer of the scalable video. The effectiveness of the algorithm is evaluated comparing with the standard selection algorithm of parts of BitTorrent from the point of view of the time of download and upload and delay in the delivery of live video. Finally, another work of this first category is presented in [20].

Additionally, researches on the relationship between QoS and QoE have been made [1- 6], showing initially that this relationship is not linear [3, 4, 5]. In [1], Fiedler et al. proposed an exponential relationship between QoE and QoS, and using the data of test results of MOS (Mean Opinion Score), found a curve that best fits those results, than the logarithmic relationship proposed in [3].

Kim and Choi [4] make a study of a correlation model between QoS and QoE to measure the QoE of an IPTV service. QoS is defined as a function of the loss (PER), burst level (U), jitter (J), packet delay (A) and the bandwidth (B), as shown in equation 1.

𝑄𝑜𝑆(𝑋) = 𝐾(𝛽1𝑃𝐸𝑅 + 𝛽2𝑈 + 𝛽3𝐽 + 𝛽4𝐴 + 𝛽5𝐵) (1)

Where the coefficient _𝛽_𝑖 is a weight defined by the relative importance of each QoS parameter for the IPTV service, recommended by the organizations responsible for regulating the quality standards (e.g., ITU-T, IETF, etc.). And the constant K is a factor that determinates the QoS depending on the network access technology to the IPTV service.

With the QoS defined by equation 1, the proposed correlation between QoE and QoS is given by equation 2.

154

_{𝑀𝑂𝑆 = 𝑄𝑟(1 − 𝑄𝑜𝑆(𝑋))}𝐶𝑄𝑜𝑆(𝑋)𝑅 (2)

Where Qr is a factor that limits the MOS value according to the resolution of the terminal, C is a value that depends on the type of service signed by the user (for example, premium), and R is determined by the structure of the frame according to the GOP (Group of Pictures) size.

In [6], the authors propose a model for adaptive multimedia streaming over IP based on customer oriented metrics. Based on experimental design techniques, find a relationship obtained using empirical modeling of experimental data, considering controllable parameters (eg, the bit rate of the video) and uncontrollable (eg, loss, delay and jitter). The influence of each factor on the response is determined by a Pareto analysis, determining after the contribution of these factors in the variability of the system by doing an ANOVA (Analysis of Variance). According to the results, the factors with the greatest influence on the quality are packet loss, delay, jitter and video coding rate, generating from the experimental data, and by means of a nonlinear regression, an equation to estimate the quality of experience or video (QoEVídeo) and the quality of experience of the audio (QoEAudio) is found (equations 3 and 4).

𝑄𝑜𝐸𝑉𝑖𝑑𝑒𝑜= 𝛿1𝑃𝐸𝑅 − 𝛿2𝐴 + 𝛿3𝐽 + 𝛿4𝐶𝑉 + 𝛿5𝐷2+ 𝛿6𝐷 ∙ 𝐽+𝛿7𝐷 ∙ 𝐶𝑉 (3)

𝑄𝑜𝐸𝐴𝑢𝑑𝑖𝑜= 𝜑1𝑃𝐸𝑅 − 𝜑2𝐴 + 𝜑3𝐽 − 𝜑4𝐶𝑉 + 𝜑5𝐷2+ 𝜑6𝐷 ∙ 𝐽 + 𝜑7𝐷 ∙ 𝐶𝑉 (4)

Where PER, J and A are the same variables as previously defined, and CV is the encoding rate of the video.

Making a multi-objective optimization for each response QoEi (audio and video) a function of desire di(QoEi) determines a value between 0 and 1 for the possible values

of QoEi, where di(QoEi) = 0 represents a value totally unwanted and di(QoEi) = 1 is an

optimal value in the response. The overall desire value (D) is calculated using a geometric mean of all the values of the desires function. Thus, D is calculated as shown in Equation 5.

_{𝐷 = {𝑑1(𝑄𝑜𝐸1) ∙ 𝑑2(𝑄𝑜𝐸2) … 𝑑𝑛(𝑄𝑜𝐸𝑛)}}1𝑛 (5)

The quantitative value D can be used to make management procedures of services based on network metrics to control the multimedia quality provided by modifying the value of the encoding rate of the video, which is a controllable parameter.

155

_{𝑄𝑜𝐸 = √𝑑}₁_{(𝑄𝑜𝐸}_{𝑉𝑖𝑑𝑒𝑜}_{) ∙ 𝑑}₂_{(𝑄𝑜𝐸}_{𝐴𝑢𝑑𝑖𝑜}) (6)

Recalling that the QoE is expressed as a desire function and its value ranges from 0 to 1 in this case.

In [7], the authors propose a prediction model of MOS for MPEG-4 sequences over a wireless network. After classify and group the video sequences is made a nonlinear regression to fit simulation data to an equation that describes the behavior of the MOS. The authors combined different frame rates in coding with different send rates (application level) and with different loss rates (network level) thus succeeded in estimating MOS considering content, network and application parameters. The equation used to predict the MOS is shown below.

_{𝑀𝑂𝑆 =}𝛾1+𝛾2𝐹𝑅+𝛾3ln 𝑆𝐵𝑅

1+𝛾4𝑃𝐸𝑅+𝛾5𝑃𝐸𝑅2 (7)

Thus, for each video sequence they calculated the PSNR, then mapped it into MOS values (MOS obtained), and with a nonlinear regression achieved an equation to estimate the MOS (MOS predicted) using the parameters of Frame Rate (FR), Sender bitrate (SBR) and Packet Error Rate (PER).

In [8], the authors quantify the effects of jitter on the perceptual quality of video over a 3G network, proposing a modeling of the correlation between QoS and QoE. Using the algorithm recommended in [9] to calculate the jitter, the authors implement a tool for estimating the QoE from network parameters, also considering user ratings in real time on a device with an Android operating system. This prediction of QoE is used to make the selection process of the network that best serves the end user.

Equation 8 shows the correlation between QoS and QoE as defined by the authors. _{𝑀𝑂𝑆 = −𝜌}₁_𝐽𝜌2 + 𝜌₃ ₍₈₎

Besides the jitter was also studied the effect of burst on the MOS, coming in a equation similar to that obtained previously:

_{𝑀𝑂𝑆 = 𝜇}₁_𝐽−𝜇2− 𝜇₃ ₍₉₎

4. Mapping QoS into QoE 4.1 Initial considerations

The assessing of the quality of a video can be done in an objective or a subjective way. Subjective testing methods require a human perspective to evaluate quality or quality

156

differences between two images or videos. Methods of objective tests are mathematical models that estimate the perceived video quality by an average user.

One of the goals when building an application on a communication network is the user satisfaction, which can lead to a satisfactory quality of experience. For a multimedia application the main component of the QoE is the perceived quality, that is, the perceived quality of the multimedia stream, as the stream is viewed by the end user, which is clearly a subjective concept.

The difficulties behind the subjective evaluation of P2P systems arise from the fact that such systems usually cover very large geographical areas, involving a large number of peers. Taking this into account, the performance of these systems is significantly affected by the unpredictable behavior of the users and the status of sub-networks that make up the whole network, making them difficult to model. In addition to that, the subjective evaluation can be a costly and a time consuming process, and the method cannot be used to monitor quality in real time [28].

4.2 Architecture for testing

The architecture considered is based on the use of discrete event simulation. The block diagram shown below describes how the assessment of quality is calculated in the distribution system for VoD over P2P networks, used in this work.

Fig. 1. Scenario for testing.

The first step is to encode the video stream, which is in YUV (4:2:0), available in CIF (176x144) and QCIF (352x288), with the JSVM software. The encoding is done using three types of scalability (spatial, temporal and quality), and a medium grain scalability (MGS). The encoded sequence contains: two spatial layers, CIF and QCIF resolutions.

157

Five temporal layers, frame rates of 1.875, 3.75, 7.5, 15 and 30 fps (frames per second). Three layers of quality SNR (Q0, Q1 and Q2) and a GOP of 16. Combining the different scalabilities, the video encoded in the H.264/SVC standard in this particular case, can be up to thirty layers, one base layer and twenty nine enhancement layers. With the videos coded and classified, the next step is to simulate the video distribution over a P2P network using the simulator P2PTVSim. This simulator needs a configuration file where are specified the network parameters (for example, the number of pairs, the number and size of the chunks, the probability of loss, jitter, download and upload bandwidth of the pairs the upload bandwidth of the source, the bitrate of the

No documento DISSERTAÇÃO DE MESTRADO EM ENGENHARIA ELÉTRICA DEPARTAMENTO DE ENGENHARIA ELÉTRICA FACULDADE DE TECNOLOGIA UNIVERSIDADE DE BRASÍLIA (páginas 163-180)