• Nenhum resultado encontrado

High dynamic range video for mobile devices

N/A
N/A
Protected

Academic year: 2021

Share "High dynamic range video for mobile devices"

Copied!
136
0
0

Texto

(1)

High Dynamic Range Video for

Mobile Devices

PhD Thesis in Computer Science by

Miguel Ângelo Correia de Melo

Supervisors:

Doctor Maximino Esteves Correia Bessa

Professor Alan Chalmers

(2)
(3)

By

Miguel ˆ

Angelo Correia de Melo

Supervisors:

Doctor Maximino Esteves Correia Bessa

Professor Alan Chalmers

Thesis submitted to the

UNIVERSIDADE DE TR ´AS-OS-MONTES E ALTO DOURO, Portugal in accordance with the requirements for the degree of

DOCTOR OF PHILOSOPHY in the Escola de Ciˆencias e Tecnologia

This work was supported by the National Foundation for Science and Technology - FCT (Funda¸c˜ao para a Ciˆencia e a Tecnologia) through the grant SFRH/BD/76384/2011

(4)
(5)

Doctor Maximino Esteves Correia Bessa

Assistant Professor of

Departamento de Engenharias da Escola de Ciˆencias e Tecnologia Universidade de Tr´as-os-Montes e Alto Douro

Professor Alan Chalmers

Full Professor of WMG

University of Warwick

(6)
(7)

This work is dedicated to my family and friends.

(8)
(9)

The members of the jury recommend to the Universidade de Tr´as-os-Montes e Alto Douro the acceptance of thesis entitled “High Dynamic Range Video for Mobile Devices” conducted by Miguel ˆAngelo Correia de Melo in accordance with the requirements for the degree of Doctor of philosophy.

October, 2015

Jury President: Doctor Jos´e Boaventura Ribeiro da Cunha,

President of Escola de Ciˆencias e Tecnologia of the Universidade de Tr´as-os-Montes e Alto Douro

Jury Members: Doctor Jos´e Afonso Moreno Bulas Cruz,

Full Professor of Escola de Ciˆencias e Tecnologia of the Universidade de Tr´as-os-Montes e Alto Douro

Doctor Pedro Jos´e de Melo Teixeira Pinto,

Full Professor of Escola de Ciˆencias e Tecnologia of the Universidade de Tr´as-os-Montes e Alto Douro

Doctor Ad´erito Fernandes Marcos,

Full Professor of the Universidade Aberta

Doctor Ant´onio Augusto de Sousa,

Associate Professor of Faculdade de Engenharia of the Universidade do Porto

(10)

Douro

Professor Alan Chalmers,

Full Professor of WMG of the University of Warwick

(11)

storage and delivery of information that can match and indeed exceed the dynamic range of the human visual system.

Mobile devices are widespread nowadays and are widely used to consume multimedia. So far, there is no known HDR display for these small screen devices (SSDs) leading to a need of performing a dynamic range reduction of the HDR contents to match the displays’ dynamic range using algorithms known as tone-mapping operators (TMOs). As mobile devices can be used in a variety of environments that can have impact on the viewing experience and, consequently, on the TMOs’ accuracy, it is important to evaluate and identify the best methods to deliver HDR content on SSDs.

This thesis presents two psychophysical studies that evaluate the delivery of HDR video on mobile devices taking into account two different viewing conditions: different ambient lighting levels and screen reflections. In the first case, results have shown that the TMO accuracy changes from dark and medium scenarios to bright scenarios and that the accuracy of the perceptual match of TMOs remains the same across SSDs and conventional sized displays. In the latter, results demonstrate that the greater the area exposed to reflections the larger the negative impact on the TMOs’ perceptual accuracy. The results also show that when reflections are present hybrid TMOs do not perform better than the standard TMOs.

A novel context-aware HDR video delivery system for mobile devices is also proposed. This outcome uses the knowledge gained from the psychophysical studies conducted and allows the delivery of HDR content on mobile devices taking into account specifics pertaining to the device and the context in which the content is to be used. The system has support for mobile devices that do not have enough computational power to decode HDR content locally, through a novel mechanism that allows the mapping process to occur on the server-side and delivers a LDR tone-mapped stream. The proposed solution allows live HDR content broadcasting as well as providing a Content Management System (CMS) to manage an online HDR video repository. Lastly, a prototype based on the proposed system is presented and evaluated through a set of performance tests that demonstrate the solution’s reliability.

Keywords: Computer graphics, high dynamic range, tone mapping, mobile devices.

(12)
(13)

permitindo representar informa¸c˜ao que pode igualar ou at´e mesmo ultrapassar a gama dinˆamica de luz percepcion´avel pelo sistema visual humano.

Hoje em dia, os dispositivos m´oveis s˜ao largamente utilizados para consumir conte´udos multim´edia. At´e ao momento a tecnologia HDR ainda n˜ao chegou a estes dispositivos de ecr˜a pequeno (DEP), sendo assim necess´ario efetuar a redu¸c˜ao da gama dinˆamica dos conte´udos HDR para os adequar `a gama dinˆamica dos DEP. A redu¸c˜ao da gama dinˆamica ´e obtida atrav´es da aplica¸c˜ao de algoritmos denominados de tone-mapping operators (TMOs). Dado que os dispositivos m´oveis podem ser usados numa variedade de ambientes que podem ter impacto na experiˆencia de visualiza¸c˜ao e, consequentemente, na performance dos TMOs, torna-se importante avaliar e identificar os melhores m´etodos para entregar conte´udos HDR em DEP. Esta tese apresenta dois estudos psicof´ısicos que avaliam a entrega de v´ıdeo HDR em dispositivos m´oveis tendo em considera¸c˜ao duas condi¸c˜oes de visualiza¸c˜ao: diferentes n´ıveis de ilumina¸c˜ao ambiente e reflexos no ecr˜a. No primeiro estudo, os resultados mostraram que a performance dos TMOs altera-se entre os cen´arios com n´ıveis de ilumina¸c˜ao ambiente baixo e m´edio em rela¸c˜ao a cen´arios com n´ıveis de ilumina¸c˜ao ambiente elevados e que a precis˜ao percetual dos TMOs ´e constante entre DEP e ecr˜as de tamanho convencional. No segundo estudo, os resultados demonstraram que quanto maior for a ´area do ecr˜a exposta a reflexos, maior ´e o impacto negativo na precis˜ao percetual dos TMOs. Os resultados tamb´em demonstraram que, nos casos em que existem reflexos, uma abordagem utilizando TMOs h´ıbridos n˜ao resulta numa melhor performance quando comparado com os TMOs aplicados isoladamente. Com base no conhecimento obtido nos estudos psicof´ısicos realizados, foi proposta uma arquitetura inovadora de entrega de v´ıdeo HDR para dispositivos m´oveis sens´ıvel ao contexto. Esta arquitetura permite a entrega de conte´udos HDR em dispositivos m´oveis tendo em conta as especificidades destes dispositivos e o contexto onde estes est˜ao a ser utilizados. O sistema proposto suporta dispositivos m´oveis que n˜ao tˆem capacidade computacional para descodificar localmente conte´udos HDR atrav´es dum mecanismo inovador que permite que o processo de tone-mapping seja feito no servidor, sendo o resultado posteriormente disponibilizado atrav´es de um stream de v´ıdeo LDR. Para al´em de permitir a transmiss˜ao em tempo real de conte´udos HDR, a arquitetura proposta contempla um sistema de gest˜ao de conte´udos que permite a gest˜ao de um reposit´orio online de v´ıdeos HDR. Por ´ultimo, um prot´otipo baseado no sistema proposto ´e apresentado e avaliado atrav´es duma s´erie de testes de performance que demonstram a sua viabilidade.

Palavras-chave: Computa¸c˜ao gr´afica, High Dynamic Range, tone mapping, dispositivos m´oveis.

(14)
(15)

Foremost, I would like to express my sincere gratitude to all those who gave me the possibility to complete this thesis, especially to my supervisors Professor Maximino Bessa from UTAD and Professor Alan Chalmers from the University of Warwick; whose expertise, immense knowledge, and encouragement, added considerably to my graduate experience. Their guidance helped me in all times and I could not have imagined having better advisors and mentors for my PhD study. I would also like to thank the Magnificent Rector of the Unitversity of Tr´ as-os-Montes e Alto Douro, Professor Ant´onio Augusto Fontainhas Fernandes and to his predecessor, Professor Carlos Alberto Sequeira, to the President of the Engineering Department, Jos´e Boaventura Cunha and predecessor Professor Jos´e Afonso Bulas Cruz, to the Manager of the Centre for Information Systems and Computer Graphics of INESC TEC Ant´onio Gaspar, and to the President of the INESC-TEC Jos´e Mendon¸ca for the facilities and the means provided to me for the realisation of this work.

I would like to acknowledge the National Foundation for Science and Technology -FCT (Funda¸c˜ao para a Ciˆencia e a Tecnologia) for supporting this PhD through the grant SFRH/BD/76384/2011. I would also like to acknowledge the the European Union (COMPETE, QREN and FEDER) for the partial support through the project REC I/EEI-SII/0360/2012 entitled “MASSIVE – Multimodal Acknowledgeable multiSenSorial Immersive Virtual Enviroments”.

I have furthermore to thank the friends I have made at UTAD, namely Emanuel xiii

(16)

To the members of the Computer Graphics group at Warwick University; in particular to Carlo Harvey for his availability in helping me with the review of this thesis for English style and grammar as well as for his insightful feedback; to Thomas Bashford-Rogers for the great talks and for his support and valuable hints; to Kurt Debattista for efforts made to analyse the experimental data and for assistance in the writing of the papers; to John Hatchet and Josh McNamee for all their patience and support in getting the HDR encoder to work.

Another manifestation of gratitude must be addressed to Telmo Ad˜ao, a loyal friend that I can always count on, that has accompanied me throughout this challenge with his sharp suggestions and great encouragement; as well as to D´ercia Santos for all the great moments that we all shared.

A very special thanks must go to Indal´ecia Melim for all her support and encouragement during this demanding quest. She was always there cheering me up and stood by me through the good times and the bad.

I would like to express my gratitude to my family, especially to my mother who has always supported my decisions and helped me with everything that I needed. This gratitude extends to my grandfather, my father, my grandmother, my brother.

Lastly, I would like to give my special thanks to all my friends; especially to Dino Ortol´a for his support, and encouragement; and to Martinho Gon¸calves and C´atia Dias for their support. This gratitude extends to all the people who enabled me to complete this work. Without them this PhD would not have been possible.

UTAD, Vila Real Miguel de Melo

30 of October, 2015

(17)

Abstract ix

Resumo xi

Acknowledgements xiii

List of Tables xix

List of Figures xxi

1 Introduction 1

1.1 Motivation and objectives . . . 2

1.2 Thesis contribution . . . 6

1.3 Research methodology . . . 6

1.4 Thesis Outline . . . 7

2 Background and Related Work 9 2.1 HDR Pipeline . . . 9 2.1.1 Capture . . . 10 2.1.2 Storage . . . 12 2.1.3 Delivery . . . 16 2.2 HDR tone-mapping . . . 19 2.2.1 Time-independent TMOs . . . 20 2.2.2 Time-dependent TMOs . . . 25 xv

(18)

3 Evaluation of Tone-Mapping Under Different Luminance Levels 45 3.1 Case study . . . 45 3.2 Participants . . . 46 3.3 Experimental design . . . 46 3.4 Experimental configuration . . . 47 3.5 Procedure . . . 52 3.6 Results . . . 52

3.6.1 Overall results across all displays across all luminance levels . 53 3.6.2 Results for dark luminance . . . 54

3.6.3 Results for medium luminance . . . 55

3.6.4 Results for bright luminance . . . 55

3.6.5 Overall results for SSDs . . . 56

3.6.6 Overall results for each scene . . . 57

3.7 Discussion . . . 57

4 Evaluation of Tone-Mapping with Reflection on the Screen 61 4.1 Case study . . . 61 4.2 Participants . . . 62 4.3 Experimental design . . . 62 4.4 Experimental configuration . . . 63 4.5 Procedure . . . 65 4.6 Results . . . 67 4.6.1 Overall results . . . 67

4.6.2 Results obtained for Scenario 1 - reflections across the whole screen . . . 68

4.6.3 Results obtained for Scenario 2 - no reflections on the screen . 69 4.6.4 Results obtained for Scenario 3 - reflections on half of the screen 69 4.7 Discussion . . . 70

5 A Novel Context-Aware HDR Video Delivery Architecture for Mobile Devices 73 5.1 System description . . . 74

5.2 Requirements analysis . . . 74

5.3 Architecture . . . 75

5.3.1 HDR video player for mobile devices . . . 76

5.3.2 HDR video streaming server . . . 79

5.4 Prototype . . . 81 xvi

(19)

5.5 Prototype evaluation . . . 84

5.5.1 Mobile devices considered . . . 84

5.5.2 Experimental design . . . 85 5.5.3 Experimental setup . . . 85 5.5.4 Procedure . . . 86 5.5.5 Associated variables . . . 87 5.5.6 Results . . . 87 5.5.7 Discussion . . . 91

6 Conclusions and Future Work 93 6.1 Future work . . . 95

6.2 Final remarks . . . 96

References 97

A State of the art smartphones 109

B State of the art tablets 111

(20)
(21)

3.1 Technical specifications of the displays used in the experiments. . . . 49 3.2 Features of the HDR videos used. . . 51 3.3 Overall results obtained for each scenario . . . 53 3.4 Overall similarity results obtained for each scene under dark

luminance environment. . . 54 3.5 Overall similarity results obtained for each scene under medium

luminance environment. . . 55 3.6 Overall similarity results obtained for each scene under high

luminance environment. . . 56 3.7 Overall similarity results obtained from the experiments for each display. 57 4.1 Technical specifications of the displays used in the experiments. . . . 64 4.2 Features of the HDR videos used. . . 66 4.3 Overall results obtained for each scenario. . . 67 5.1 Mobile devices considered for the evaluation of the proposed

architecture. . . 85 5.2 Features of the HDR videos used. . . 86

(22)
(23)

1.1 Range of luminance and associated visual parameters . . . 1 1.2 Orders of magnitude provided by different technologies . . . 3 1.3 Mobile device video trends . . . 4 2.1 HDR pipeline. . . 10 2.2 SpheronCam HDRV (adapted from (SpheronVR, 2015)). . . 11 2.3 Native HDR video cameras. . . 12 2.4 Multiple exposures of one scene (shown on the left) combined into

one HDR image (shown on the right) (adapted from (Reinhard et al., 2010)). . . 12 2.5 Mantiuk et al. (2004a) proposed extensions to MPEG-4 pipeline (in

blue). Adapted from (Mantiuk et al., 2004a). . . 14 2.6 Scheme of the HDR video data compression and methods (adapted

from (Banterle et al., 2011b)). . . 16 2.7 HDR Transparency Viewer. . . 17 2.8 HDR display systems (images adapted from (Seetzen et al., 2004)). . 18 2.9 Brightside DR37-P HDR display (adapted from (Banterle et al.,

2011a)). . . 18 2.10 Display adaptive TMO (adapted from (Mantiuk et al., 2008)). . . 28 2.11 Spatio-temporal TMO based on retina model workflow (adapted from

(Benoit et al., 2009)). . . 29 2.12 Temporal Coherency for videos workflow (adapted from (Boitard

et al., 2012)). . . 29 xxi

(24)

2.15 HDR-VDP2 workflow (Mantiuk et al., 2011)). . . 33 2.16 Dynamic range independent image quality assessment dataflow

(adapted from (Aydin et al., 2008)). . . 34 2.17 Dynamic range independent video quality assessment (adapted from

(Aydin et al., 2010)). . . 35 2.18 Evaluation of TMOs using a HDR display (adapted from (Ledda

et al., 2005)). . . 37 3.1 Experimental setup. . . 48 3.2 Thumbnails of the HDR videos used on the experiments. . . 50 3.3 Screenshot of the evaluation software. . . 51 4.1 Experimental setup scheme. . . 63 4.2 Mobile device with half of the screen under reflections. . . 64 4.3 Devised software for the experiments. . . 66 5.1 HDR video delivery system architecture. . . 75 5.2 Flowchart of the HDR video player for mobile devices. . . 77 5.3 HDR video streaming server architecture. . . 79 5.4 HDR video streaming server flowchart. . . 81 5.5 General scheme of the HDR video player for mobile devices with HDR

support. . . 82 5.6 HDR system encoding and delivering HDR contents in real time. . . . 83 5.7 Average FPS for iOS devices. . . 88 5.8 Average FPS for Android devices. . . 88 5.9 Average battery drain for iOS devices. . . 89 5.10 Average battery drain for Android devices. . . 90 5.11 Average CPU load for iOS devices. . . 90 5.12 Average CPU load for Android devices. . . 91

(25)

1

Introduction

The real world lighting brings to us a wide range of colours and intensities to which the Human Visual System (HVS) can adapt itself in order to be able to perceive details in scenes that vary significantly within that range. On average, it estimated that the HVS can perceive detail over a range of nearly 12 log units (or orders of magnitude) (Hood and Finkelstein, 1986), ranging from approximately 10−4 cd/m2

to 108 cd/m2 (Figure 1.1).

Figure 1.1 – Range of luminance and associated visual parameters (adapted from Hood and Finkelstein (1986)).

Despite that the HVS operates over 12 orders of magnitude of dynamic range, it cannot perceive the whole range simultaneously. In fact, the HVS is only able to process simultaneously about 4 orders of magnitude and, depending on the range of illumination levels perceived, the adaptation of the visual system is different across different scenarios. This happens because the retina of the eye is composed, essentially, of two types of cells: rods (responsible for vision at low light levels) and

(26)

cones (responsible for colour vision and detail). As cones are not functional at low luminance levels, the HVS is more sensitive to detect small differences in luminance but has more difficulties in perceiving colours (scotopic vision), while in a bright scenario the HVS has a more accurate perception of colours, since the cones are active and therefore provide colour mediation (photopic vision).

Even though the HVS can perceive simultaneously about 4 orders of magnitude of luminance, conventional imaging technology can only achieve about 2 orders of magnitude, making them inefficient to reproduce scenes where there are areas with significant differences in terms of luminance ((Reinhard et al., 2010)). Due to these limitations, conventional imaging technology is often referred to as LDR (Low Dy-namic Range) or SDR (Standard DyDy-namic Range). The term dyDy-namic range refers to the ratio between the brightest and the darkest value that can be displayed si-multaneously (Trentacoste et al., 2007). As conventional imagery technology cannot match yet the HVS dynamic range, there is a realism reproduction issue since it is not possible to deliver the levels of luminance of the real world, especially in scenes that require a higher dynamic range to be reproduced. High Dynamic Range (HDR) imagery is designed to overcome the constraints associated with the conventional imaging technology by capturing, storing, and delivering real-world luminance. Fig-ure 1.2 shows the number of orders of magnitude achieved by conventional imagery technology, film and paper and compares it with the HVS dynamic range, contex-tualizing them through various luminance scenarios.

1.1

Motivation and objectives

Although there has been substantial previous research and development in the field of HDR, ranging from capture to delivery, there is still work to be done in order to help HDR technology to become truly widespread. One important step is to undertake coordinated research across different research groups and define common interface standards. To achieve this, Eu COST Action IC1005 “HDRI: The digital capture, storage, transmission and display of real-world lighting” (ICT COST Action IC1005, 2011) was established in May 2011 to coordinate academic and industrial researchers across Europe and to propose a set of standards for the complete HDR pipeline. This thesis was inspired by COST Action IC1005 and focuses on the delivery of HDR video on mobile devices. This is an emerging field and yet it has attracted little previous research.

HDR contents can be delivered in a straightforward manner if the dynamic range of the display matches the lighting magnitude of the data (Seetzen et al., 2004).

(27)

Figure 1.2 – Orders of magnitude afforded by different technologies (adapted from (Krawczyk, 2007))

However, HDR displays are still expensive and thus not yet widely available. Fur-thermore there is no HDR display designed specifically for mobile devices so far and so there is a need to perform dynamic range reduction so the HDR content can be properly shown on LDR displays. The dynamic range reduction is achieved by us-ing specific algorithms known as tone-mappus-ing operators (TMOs). The main goal of a TMO is to reduce the dynamic range of an image/video, to match the dynamic range of a specific display, preserving as much as possible of the original perceived contrast in order to reproduce its visual appearance (Banterle et al., 2011a). Many TMOs have been proposed following different approaches such as following the HVS features or taking into account the features of the targeted displays but only a small number of TMOs address specifically HDR video and there are no TMOs devel-oped specifically with the purpose of delivering HDR video on small screen devices (SSDs), such as the displays of mobile devices.

Mobile devices’ can be defined as “a portable, wireless computing device that is small enough to be used while held in the hand” (Dictionary.com, 2015). As there is a wide spectrum of mobile devices, for the purposes of this work the term mobile devices considers specifically smartphones and tablets (the same applies for the term SSDs) that support the reproduction of multimedia content.

(28)

Mobile devices popularity has been increasing and the current numbers show that in 2014 there are almost as many mobile-cellular subscriptions in the world as people with an estimated global penetration rate of 96% (Union, 2014). Along with the growth of the penetration rate, the mobile-broadband subscriptions have also grown significantly, such that in the past 6 years they have increased by 600% (Union, 2014).

Video consumption on mobile devices is also growing significantly. According to OOYALA’s Global Video Index report (Ooyala, 2014) that analyzed the viewing habits of video on mobile devices of approximately 200 million unique viewers in over 130 countries from 2012 to 2014, there was a year-over-year increase of about 114%. The report shows that although in 2012 only 6% of all online video was requested by mobile devices, there was a growth of approximately 400% over the following two years. In 2014 nearly 30% of all the video views were made by mobile devices. Figure 1.3 illustrates the mobile devices video trends according to Ooyala (2014).

Figure 1.3 – Mobile device video trends (adapted from (Ooyala, 2014))

Considering these numbers and the market tendency, Ooyala (2014) predicts that by 2015 mobile devices will represent half of the total online video views. Mobile devices are rapidly becoming a major platform for multimedia consumption and, with that, a need arises to ensure an optimal experience when viewing HDR content on typical mobile device screens.

HDR video on mobile devices can pose challenges such as the diversity of scenarios in which they are used, local storage usage, power consumption, or the of associated data. Such challenges demand efficient methods to deliver HDR video on mobile devices. Overcoming them will be a big step forward in the way users can access

(29)

this type of media since HDR video is capable of delivering real-world luminance information. Another important factor is that the trends associated with mobile devices naturally demand the development of mechanisms that can allow distributing HDR video remotely.

Mobile devices can be used under a variety of scenarios, and the ability to use the right mobile tone mapper under diverse lighting conditions is important for appli-cations that require that the captured HDR content to be immediately displayed on a mobile device under widely varying luminance scenarios from harsh outdoor sun-light to dark indoor. Therefore, it is important to study the impact of such variables on the viewing experience and use the gained knowledge to optimise the delivery of HDR video on mobile devices, namely by the development of a new context-aware system that is capable of continuously reading context information and selecting the more appropriate TMO for each usage scenario. Such system was developed rather than proposing a new TMO. This was because there are currently there are no solu-tions that allow reliable visualisation of HDR video on mobile devices whereas many TMOs have already been proposed (and many more are likely to be proposed in the future). The key advantage of the novel context-aware system is that it is “future proof” and able to incorporate any new developments in HDR video, including any new TMOs. This context aware system should ensure compatibility with devices that do not have computational power to locally decode HDR contents in order to reach a wider audience and promote HDR adoption by users.

The main goal of this thesis is to study the delivery of HDR video on mobile devices taking into consideration specifics of the device. To achieve this main goal, the following objectives have been defined:

• Investigate whether the TMOs that are successful for traditional displays also perform well on SSDs;

• Study the impact of different lighting levels on TMOs as mobile devices can be used under a variety of lighting conditions;

• Study the impact of screen reflections on TMOs as they can be easily exposed to them;

• From the insights of the research above, propose a novel context-aware HDR video delivery system for mobile devices.

(30)

1.2

Thesis contribution

The work here presented is focused on the delivery of HDR video on mobile devices and the following contributions were made:

• An HDR video player for mobile devices was developed, which included the implementation of a set of TMOs;

• An evaluation was performed of the impact of different ambient luminance levels on HDR video tone-mapping on mobile devices. It was discovered that under dark and dim environments the TMOs accuracy ranking obtained was different than for bright lighting levels. Display size was also considered as a factor and it was possible to verify that there are no significant differences across conventional sized displays and SSDs;

• The impact of HDR video tone-mapping for mobile devices was investigated, taking into account screen reflections, and it was concluded that the greater the area exposed to reflections the larger the negative impact on a TMOs’ accuracy and that hybrid TMOs do not outperform standard TMOs; and, • A novel context-aware HDR video delivery architecture was proposed, which

enables the delivery of HDR video on mobile devices both locally and remotely, as well as supporting real-time broadcasting. The solution presented is also capable of supporting less powerful devices through an innovative remote tone-mapping approach.

1.3

Research methodology

A good research methodology is important for the success of a thesis. In this case, two main research methodologies were adopted as they were the best fit for achieving the proposed goals: a case study and action research.

The case study methodology allows studying and understanding the relationship be-tween variables and to understand causal processes of one over another (Soy, 2007). This research methodology was adopted for studying the impact of different viewing conditions (lighting levels and screen reflections) on HDR video tone-mapping. Action research methodology, on the other hand, consists of an iterative approach that has four “moments”: “plan”, “act”, “observe” and “reflect” (Kemmis et al.,

(31)

2004). The first moment of this research methodology is to plan an action or to improve the product if it is starting a new iteration on the research cycle; the “act” moment consists of implementing the plan; the “observe” step refers to the observation of the results of the “act” moment; and the “reflect” step consists of a reflection about the observed effects and consequent plan refinement if needed -and consequently beginning a new interaction cycle. This research methodology was adopted for the development of the HDR video player as well as for the proposal, development and evaluation of the proposed video delivery system for HDR video on mobile devices.

1.4

Thesis Outline

Chapter 1: The initial chapter of this thesis introduces the research topic, motivation and objectives. Thesis contributions, research methodologies adopted and their outlines are also presented.

Chapter 2: This provides fundamental information on the research field of this thesis, beginning with a brief overview over all stages of the HDR pipeline (capture, storage, and delivery) and followed by an overview of state of the art regarding HDR on mobile devices. As this thesis is directed towards mobile devices and aims to understand the impact of some determined viewing conditions on a TMOs performance, there is a need for both using and evaluating state of the art TMOs. Therefore, this chapter discusses a set of state of the art TMOs as well as a number of TMOs evaluation studies. This allowed understanding and identifying the common practices in the field and it was important to define an optimal methodology to adopt for the evaluation studies.

Chapter 3: Due to the portability of mobile devices, they can be used under widely varying luminance conditions that can have impact on the visualization experience. The third chapter presents an evaluation of HDR video tone-mapping for mobile devices under different luminance levels in order to understand the impact of such scenarios on HDR video visualization. From the study, it was concluded that there are differences between the performance of the TMOs under different ambient light-ing levels and the TMOs that perform well on traditional large screen displays also perform well on SSDs at the same given luminance level.

Chapter 4: Mobile devices are often used in conditions in which the screen is exposed to reflections that can compromise the viewing experience. This chapter presents an evaluation study addressing screen reflections in order to investigate the impact of

(32)

reflections on the screen and if a hybrid tone mapping approach benefits the viewing experience. The study concluded that the greater the area exposed to reflections the larger the negative impact on a TMOs perceptual accuracy. Results also show that, at least under the observed conditions, when reflections are present the hybrid TMOs do not perform better than the standard TMOs.

Chapter 5: This chapter presents a novel context-aware HDR video delivery ar-chitecture for mobile devices that allows distribution of HDR video even to less powerful devices both locally and remotely with the possibility of having real-time broadcasting. The context-aware feature enables the context variables such as the current lighting levels of the usage scenario to be continuously read and the best TMO for the given scenario to be selected based on the knowledge gained from the psychophysical experiments presented in Chapter 3 and Chapter 4. This architec-ture is subsequently evaluated and shown to be reliable. Additionally, this chapter presents a mini content management system to manage the content that is available to users.

Chapter 6: Finally in this thesis, this chapter presents the main conclusions and contributions of the thesis. Final remarks and suggestions for future work are also given.

(33)

2

Background and Related Work

HDR imaging was developed to overcome the limitations associated with LDR imag-ing and it can brimag-ing benefits to many areas such as security, entertainment, art, sci-entific research or health. This technology is capable of delivering an optimal viewing experience to its users since it can reproduce scenes that match the dynamic range of the HVS. In addition, this technology is directly compatible with High-Definition and 3D technologies, therefore is not a substitute but rather a complement that can enhance these technologies as its main focus is the ratio of the maximum and min-imum luminance that it is possible to emit rather than the number of pixels per frame or the three-dimensionality. This chapter provides an overview of the fun-damental concepts required for this study, namely of the HDR pipeline, of HDR tone-mapping and of previous work targeted towards HDR on mobile devices.

2.1

HDR Pipeline

The HDR pipeline (Figure 2.1) can be divided into three main stages: capture, storage and delivery. In the first stage, the contents are captured whether using native HDR hardware or using LDR hardware combined with methods like multiple exposures capture. Stage two is where the content is stored using proper encoding to handle the HDR data. The second stage can also include an optional step called manipulation where post production, image based lighting and scene analysis can take place. The third stage comprises the delivery of HDR content that can be

(34)

shown directly on an HDR display device or, if the display’s dynamic range is lower than the content’s dynamic range, to a LDR display after performing a dynamic range reduction of the content using, for example, tone-mapping operators.

Figure 2.1 – HDR pipeline.

2.1.1

Capture

The capture stage, as the name suggests, captures all the information of the real-world scene. There are two main methods to obtain HDR content: through a native HDR camera, or through a conventional camera using multiple exposures, or computer generated using rendering techniques. Native HDR cameras include the SpheronCam HDRi (SpheronVR, 2015), the SpheronCam HDRv (Chalmers et al., 2009), the Civetta 360 by AG (2010), and the AMP HDR camera (Tocci et al., 2011).

SpheronVR (2015) presented the world’s first HDR imaging camera in 2001 capable of obtaining 360◦ images with a dynamic range up to 26 f-stops and a resolution up to 50 megapixels (10624×5312). In 2009 SpheronVR has launched the SpheronCam HDRv (Figure 2.2) in collaboration with the University of Warwick (Chalmers et al., 2009). This second camera is capable of capturing HDR video content with a resolu-tion of 1980×1080 at 30 FPS and a dynamic range of 20 f-stops. In order to obtain

(35)

HDR video, the frames are first stored in a HDD array and then post-processed into a sequence of HDR files (Banterle et al., 2011a). The post-processing is conducted using solutions as the one proposed by goHDR (2014b) in order to encode the HDR content to obtain the video file.

Figure 2.2 – SpheronCam HDRV (adapted from (SpheronVR, 2015)).

The Civetta 360 camera 2.3(a) was also released in 2009 and it is built based on Canon technology, it also supports the capture of 360◦ HDR images. This camera is able to capture a full spherical HDR image in 40 seconds with a resolution of 100 megapixels (14142×7071) and a dynamic range of 28 f-stops (AG, 2010). In 2011 AMP released the AMP HDR camera 2.3(b) that captures up to 17.5 f-stops and supports a resolution of 1920×1080 at a frame rate that can vary between 24 and 30 FPS (Tocci et al., 2011).

Since the HDR cameras are not widespread, an alternative is to capture the real-world scene using a regular camera with multiple exposures (Debevec and Malik, 1997). The multiple exposure method captures multiple shots of a scene with dif-ferent exposures (underexposed, normally exposed, and overexposed) and combines them into a single frame as Figure 2.4 illustrates. More recent devices have the auto bracketing exposure feature that allows the camera to automatically take the same shot with different exposure settings. Nowadays it is also common to find an HDR mode in digital cameras that allows capturing HDR. This uses the multiple expo-sure technique but this feature is directed only to images as it is very expensive resource-wise. This static approach may fail when the purpose is to capture video as it requires some processing time and the motion on the scene is not properly cap-tured, causing ghosting effects. One possible solution to capture video is by using multiple cameras to capture the same scene and merge the frames obtained by each camera in order to create an HDR frame sequence.

(36)

(a) Civetta 360 HDR video camera (adapted from (AG, 2010)).

(b) AMP HDR video camera (adapted from (Tocci et al., 2011)).

Figure 2.3 – Native HDR video cameras.

Figure 2.4 – Multiple exposures of one scene (shown on the left) combined into one HDR image (shown on the right) (adapted from (Reinhard et al., 2010)).

2.1.2

Storage

HDR content is naturally larger than LDR as it has more associated information, causing a major effect not only on storing and transmitting HDR data but also in

(37)

terms of processing. As a consequence, efficient representations of floating point numbers have been developed as well as many classic compression algorithms such as JPEG and MPEG have been extended to handle HDR images and videos. Below is a brief overview of some of the most popular HDR image and video file formats.

HDR image file formats

In the early 1990’s the Radiance RGBE encoding method was proposed (Ward, 1991). It was designed originally to compute photometric quantities and it is identi-fied by many as the first HDR file format. This 32 bit file format adds one channel (E) to the conventional RGB channels. Radiance RGBE uses an extra 8 bit channel for a common exponent with the other channels that store the information regard-ing the red, green and blue mantissa. Although this format is capable of encodregard-ing 76 orders of magnitude, this file format is limited to a positive range of values since it is based on the RGB representation and that makes it unable to reproduce the entire visible gamut. To overcome this limitation, a second variant of the format designated XYZE was created and instead of using the RGB colour space, it op-erates on the XYZ colour space that opop-erates with negative values, extending the dynamic range to the entire visible colour gamut (Ward, 2005).

In 2003, OpenEXR was launched as an open-source C++ library with the purpose of answering the demands of the visual effects industry (Kainz et al., 2004). This format is used for HDR imaging as it can use up to 48 bits per pixel covering a dynamic range of 1010.7. The bits are divided 1 bit for signal, 5 bits for exponent

and 10 bits for mantissa per channel. This file format is able to cover the visible gamut and offers the possibility of using extra layers such as the alpha channel, depth channel or the spectral channel.

Dolby’s JPEG-HDR image file format (Ward and Simmons, 2006) is backwards-compatible with the traditional JPEG file format (in fact, this format can be con-sidered an extension to it). To encode an HDR image using this file format the first step is to tone-map the image with a pre-determined TMO, which the authors sug-gest using Reinhard’s photographic operator (Reinhard et al., 2002) or the bilateral filter operator (Durand and Dorsey, 2002). After having the tone-mapped image, a RI (Ratio Image) is generated that consists of a greyscale image that contains the information needed to recover the original data. The RI is stored as a sub-band marker on the generated file and will be used for reconstruction purposes when the decoding software is designed to handle HDR otherwise the sub-band is ignored. The 32-bit LogLuv TIFF image file format is based on logarithmic encoding and is

(38)

part of the TIFF library (Larson, 1998). In this file format the data is converted into 2 channels: the luminance and the CIE. The logarithmic value of luminance is calculated and quantized into a specific range. 15 bits are reserved for luminance, 16-bits for chromaticity and 1 bit for the sign of luminance which allows negative values to be encoded and covers a dynamic range of about 38 orders of magnitude.

HDR video file formats

One popular HDR video file format was proposed by Mantiuk et al. (2004a) and is an extension to the MPEG-4 ISO/IEC 14496-2 compression standard (ISO-IEC-14496-2, 2004). This encoding proposes a new approach for inter-frame encoding taking advantage of the luminance quantisation optimisation for the contrast threshold per-ception in the HVS. This method suggests modest changes to the MPEG-4 pipeline, as shown in Figure 2.5 where the proposed extensions to the original pipeline are represented on the blocks with dashed borders. The proposed encoding uses the HDR XYZ colour space instead of the 8-bit RGB since it can represent the full colour gamut and the complete range of the luminance that the eye can adapt to. Another change of this encoding to the original pipeline is the storing of colour in-formation using a perceptually linear u’v’ colour space instead of the YCbCr colour space since it offers similar compression performance but is capable of representing the entire colour gamut.

Figure 2.5 – Mantiuk et al. (2004a) proposed extensions to MPEG-4 pipeline (in blue). Adapted from (Mantiuk et al., 2004a).

HDR-MPEG was proposed by Mantiuk et al. (2006) and is a backward compatible HDR-MPEG compression that divides the HDR stream into two streams. One stream is an LDR stream compatible with the conventional MPEG encoders while the second stream contains residual information that enables the restoration of the original HDR stream due to the strong correlation between the LDR stream and the residual stream. In order to minimize the information redundancy, the two streams are decorrelated through the use of a perceptually meaningful comparison of the LDR and HDR pixels. Due to the independent streaming, it is possible to

(39)

independently tune the content for LDR or HDR displays.

Lee and Kim proposed an algorithm that addresses HDR video known as “rate-distortion optimized compression for HDR videos” (Lee and Kim, 2008). This method compresses the video dividing it in two sequences: one tone-mapped LDR sequence and a ratio sequence. Taking advantage of the correlation between the streams (Mantiuk et al., 2006), it is possible to derive the HDR values from the ra-tio sequence and the LDR tone-mapped sequence. To avoid noise on the rara-tio frame, a cross bilateral filter is applied. Further work was done by Lee and Kim (2012), for HDR video sequences, that optimises optimizing the rate-distortion method. In 2010, Motra and Thoma proposed a method that allows using an existing video codec for encoding HDR video sequences with an adaptive LogLuv transformation (Motra and Thoma, 2010). The method uses the LogLuv transform proposed by Ward (Larson, 1998), where the RGB values converted to the XYZ colour space and then to a LogLuv colour space. This method uses H.264/AVC that can handle a maximum of 14 bits per channel.

Although several methods have been proposed, there is one that gains special impor-tance as it is the method that is used alongside the work developed and presented in this thesis. This method has been patented under the name “HDR video data compression and methods” by Banterle et al. (2011b) but can be also named as the goHDR method. This method was adopted since it was the method that was completely available and open for the work presented in this thesis. The method al-lows encoding HDR video by compressing a stream of HDR content into a final data stream that is composed of a compressed LDR frame, a compressed detail frame and the properties of the tone mapping operation applied to the base frames. The first step applies a bilateral filter to the HDR frame to extract the base frame. After ex-tracting the base frame, the detail frame is calculated by subex-tracting the base frame from the HDR frame or by dividing the HDR frame by the base frame in order to obtain the fine details of the HDR frame. The base frame is also used for generating the LDR base frame by applying a TMO (from a predefined list of TMOs) on the base frame. Both the LDR base frame and the detail frame are subject to tempo-ral compression. This step yields the compressed LDR base frame, the compressed detail frame and the TMO settings used. These are combined into the final frame data that is used in the video data stream. The general scheme of this encoding is shown in Figure 2.6.

(40)

Figure 2.6 – Scheme of the HDR video data compression and methods (adapted from (Banterle et al., 2011b)).

2.1.3

Delivery

The delivery process displays the HDR data. If the dynamic range of the display matches the lighting magnitude of the data, the contents are displayed in a straight-forward manner (Seetzen et al., 2004). Some examples of native HDR visualization devices are the HDR Transparency Viewer by (Ward, 2002) and (Ledda et al., 2003), the projector-based and LED-based HDR displays (Seetzen et al., 2004), the Bright-side Display (Oh, 2006), the SIM2 Solar Series HDR display (SIM2, 2014), and the Dolby Research HDR RGB backlight dual modulation display (also known as Pul-sar) (Hanhart et al., 2014).

(41)

(a) A photograph of the HDR transparency viewer (adapted from (Ledda et al., 2003)).

(b) Scheme of the HDR Transparency Viewer (adapted from (Ledda et al., 2003)).

Figure 2.7 – HDR Transparency Viewer.

The HDR Transparency Viewer (Figure 2.7(a)) was the first native HDR images viewer and it was inspired by the classic stereoscopic device used for displaying 3D images. The viewer was composed by 3 main components: a set of lamps to produce a uniform backlight, a pair of transparencies, and a pair of stereo optics (Figure 2.7(b)). This device was developed with the intent of evaluating TMOs and it yields a range of 1:104 as well as a 120◦ hemispherical fish-eye perspective. The maximum luminance value measured was 5,000 cd/m2while the minimum luminance

value measured was 0.5 cd/m2.

Seetzen et al. (2004) developed two systems that were the first HDR displays ever created: the projector-based display (Figure 2.8(a)) and the LED based display (Figure 2.8(b)). The projector-based display is a combination of a digital light pro-cessing (DLP) projector that is used to modulate the light and an LCD panel. The projection is made to the back of the transparent surface of the LED panel serv-ing as a backlight for the LCD panel and consequently increasserv-ing the brightness of the display (2.8(c)). The projector-based display achieved a minimum luminance of 0.05cd/m2 and a maximum luminance of 2,700 cd/m2. The main difference

be-tween the projector-based display and the LED-based display system is that instead of using a projector to modulate the light, the LED-based display uses a LED panel to modulate it. The minimum and the maximum luminance values measured with the LED-based value were 0.015 cd/m2 and 3 000 cd/m2 respectively. This

solu-tion was a step forward in the commercializasolu-tion of HDR displays since it solved some problems that were present in the first approach such as power consumption, temperature and associated costs.

Brightside Technologies Inc. developed in 2005 the Brightside DR37-P HDR display (Figure 2.9) based on the Seetzen et al. (2004) approach of dual-modulation of the

(42)

(a) Project-based HDR dis-play.

(b) LED-based HDR dis-play.

(c) Inner side of the project-based HDR display.

Figure 2.8 – HDR display systems (images adapted from (Seetzen et al., 2004)).

light using a LED panel as backlight and a front LCD panel. This 37” HDR display supports a resolution of 1920×1080 and the measured minimum and maximum luminance of this display is 0 cd/m2 and 4,000 cd/m2 respectively.

Figure 2.9 – Brightside DR37-P HDR display (adapted from (Banterle et al., 2011a)).

Brightside Technologies Inc. was later acquired by Dolby in 2007. Dolby later li-censed the technology to the Italian company SIM2 that in 2009 announced the first commercial HDR display. This 47” display is based on the dual-modulation of the light that has very similar characteristics to the Brightside DR37-P display, presenting a measured minimum luminance of 0.0015 cd/m2 and a measured maxi-mum luminance level of 4 000 cd/m2. After some time, Dolby released the Research

HDR RGB backlight dual modulation display. This HDR display also uses the dual-modulation of light using one LED backlight and one front LCD panel. The 42”

(43)

display supports a resolution of 1980×1020. The minimum luminance level is about 0.005 cd/m2 and the maximum luminance level of 4,000 cd/m2.

LDR content can be boosted to appear as HDR content on an HDR display. In or-der to achieve this, it is necessary to map the content through a process known as “Inverse Tone Reproduction”. Although HDR displays do exist, they expensive and thus not widespread, and so there is a need to perform a dynamic range adjustment so the HDR contents’ dynamic range matches the LDR displays’ dynamic range. This dynamic range transformation is made through a process known as “tone map-ping”. Please note that there is an alternate approach to tone mapping known as exposure fusion which makes use of EFAs (exposure fusion algorithms). EFAs differ from TMOs since they take as an input a set of LDR images with different expo-sures instead of having to generate HDR content as a first step. The algorithms merge the best exposed pixels from these LDR exposures into an LDR image that is rich in detail. As this technique does not generate any HDR content on the process it cannot be considered a truly HDR method.

2.2

HDR tone-mapping

HDR tone mapping techniques were developed to make available the visual infor-mation of real scenes to that what can be displayed on commonly available LDR displays. Many TMOs have been proposed addressing different features that aim to preserve characteristics of the real scene, such as overall brightness, detail, local or global contrast. The tone-mapping concept was first introduced into the com-puter graphics field by Tumblin and Rushmeier, first through a technical report in 1991 (Tumblin and Rushmeier, 1991) which was published in 1993 (Tumblin and Rushmeier, 1993). Their inspiration was the fact that there was no way of identify-ing if it was night or day in conventional image synthesis algorithms and that was important since the HVS behaves differently in different luminance environments. TMOs can be divided into two categories: global operators and local operators. Global operators are spatially invariant and they consider the image as a whole, mapping all the pixels of an image evenly. These operators use image statistics to optimize the dynamic range reduction as, for example, logarithmic or arithmetic averages. Because of this, global operators tend to be simple and fast and they pre-serve the global contrast of the image, but are unable to maintain the local contrast of the image which can result in a loss of details in some regions. Local operators, on the other hand, are applied differently to each pixel, taking into account a set of surrounding pixels to perform the calculation for that pixel. One of the advantages

(44)

of local operators is that they attempt to preserve both global and local contrast that can lead to a better image quality since the HVS is sensitive to local contrast. The major drawbacks of local TMOs are that they are typically complex, requiring additional computational effort, and they can introduce artefacts as, for example, halos (Banterle et al., 2011a).

Unfortunately, most TMOs do not take into account temporal coherency, an impor-tant feature when dealing with video since this can prevent, amongst other prob-lem, noticeable flickering when there are significant luminance changes from frame to frame as it considers adjacent frames or even the entire video. Since this thesis is focused on HDR video, the studied TMOs are divided into time dependent and time independent TMOs rather than global and local TMOs. Many TMOs have been suggested, it is not possible to refer to all of them in this document. Only the most relevant TMOs to this work are included. Also, as implementation was re-quired for some TMOs to integrate into the HDR video player for mobile devices, some equations are presented along with the TMOs’ descriptions.

2.2.1

Time-independent TMOs

Between the proposed TMOs, there are the simple mapping methods as well as other more elaborate methods such as, for example, false colour (Aky¨uz, 2013), filmic (Habble, 2010), brightness reproduction operator (Tumblin and Rushmeier, 1993), Ward’s contrast based scale-factor (Ward, 1994), photographic tone reproduction (Reinhard et al., 2002), and the calibrated image appearance reproduction model (Reinhard et al., 2012).

Simple mapping methods

There are TMOs that are based on simple mathematical functions such as loga-rithmic, exponential or linear scaling. These TMOs are often referred to as simple mapping methods and they are straightforward to use and do not require much pro-cessing. Although these methods can produce good results on medium dynamic range content, this kind of dynamic range adjustment ignores some features of the image such as regions with different kinds of illumination and wide dynamic range variation. This can cause deficient global contrast or illumination preservation that can yield poor results. One of the simplest mapping methods is the linear expo-sure method where the contents are mapped through a linear scale using a factor.

(45)

Another example of a simple mapping method is the exponential scaling (Equa-tion 2.1) which uses two configurable parameters (q and k ) that can range from 1 to ∞. This method calculates the luminance levels by dividing each value by the arithmetic average. Ld(x) = 1 − exp  −qLw(x) kLw, H  (2.1) False colour

False colour is a well-known colour rendering method that illustrates the colour temperature across the whole frame. Typically the blue is used to represent low intensity, green intermediate intensity and red the high intensity areas. False colour is extremely useful for analysing and understanding the scene composition and it can take advantage of HDR scenes as they contain much more information than typical LDR scenes. An example of false colour tone mapping is the sigmoidal false colour mapping presented by Aky¨uz (2013) that was inspired by Reinhard et al. (2002). The first step of this technique is to calculate the luminance and then use the HSV colour space where S = V = 100 and F C corresponds to the false colouring function. The different colours are given by a hue angle, which is computed as shown on Equation 2.2.

H = 240◦(1 − F C(Y )) (2.2) The false colouring function is shown on Equation 2.3 where a refers to a configu-ration parameter that is equal to 0.18 (suggested by the authors) and the Y refers to the logarithmic average luminance.

F Cs(Y ) = Ys 1 + Ys where Ys = a YY (2.3) Filmic TMO

The Filmic TMO appeared as a custom-made solution for handling higher dynamic range and has been used in cinematic productions and games, as for example “Un-charted 2: Among Thieves”. The TMO was proposed by John Hable (Habble, 2010) and it it essentially measures the response curve and creates a function that would

(46)

match approximately the film curve. The first step is shown on Equation 2.4 where Tf is the tone-mapping function, the Eb the exposure bias and the w the original

colour. The tone-mapping function Tf(x) is calculated as shown on Equation 2.5.

      Rd(x) Gd(x) Bd(x)       =       Tf(Eb×       Rw(x) Gw(x) Bw(x)       ) × 1 Tf(11.2)       1 2.2 (2.4) Tf(x) =  x × (0.15 × x + 0.05) + 0.004) (x × (0.15 × x + 0.5) + 0.06  − 0.02 0.3 (2.5) Brightness Reproduction

At the time this TMO was proposed, most of the image synthesis algorithms did not consider the real scene luminance conditions where the differences are obvious to the human eye. In order to overcome this limitation, Tumblin and Rushmeier (1993) proposed a method that aimed to display the original lighting atmosphere on the final image. Since the HVS can behave differently on different scenarios, a simple image scaling can be enough for some parameters but as a whole it would be an inaccurate solution since it is not sufficient to reproduce the real visual appearance of a scene. This operator adopts Stevens and Stevens (1963)’s work on brightness that can be expressed by Equation 2.6.

γ(x) =    1.855 + 0.4log10(x + 2.3 × 10-5) for x ≤ 100 cd/m2 2.655 otherwise (2.6)

The TMO is defined as shown in Equation 2.7 where Lda refers to the adaptation

luminance of the display; γ(x) is the Stevens and Stevens’ contrast sensitivity func-tion for a human adapted to a luminance value x; and m refers to the adaptafunc-tion- adaptation-dependent scaling term which prevents anomalous grey night images. With this, the operator is able to compress HDR images preserving brightness, producing plausible results when calibrated luminance values are available.

Ld(x) = mLda  Lw(x) Lw, H α where α = γ(Lw, H) γ(Lda) (2.7)

(47)

Ward’s Contrast based Scale-factor

Ward’s contrast based scale-factor aims to find the proportionality between the world luminance and the display luminance preserving the contrast visibility for a scene (Ward, 1994). The concept is to apply a scale-factor that is able represent the bright regions bright and the dark regions dark preserving the visibility of the scene. The scale-factor is calculated based on the contrast sensitivity studies by Blackwell (CIE, 1981), that establish the relation between adaption luminance (La)

and minimum visible difference as shown on Equation 2.8.

∆L(La) = 0.00594 × (1.219 + L0.4a )2.5 (2.8)

For achieving the reduction of the dynamic range of the scene, Ward used a linear formula with the scale-factor (sf ) such as the display luminance at a point was linearly transformed from the world luminance at an image point using a formula as shown in Equation 2.9 where ∆L(Lda) is the minimum discernible luminance

change at Lda, Ldais the display adaptation luminance and Lwathe world adaptation

luminance.

∆L(Lda) = m∆L(Lwa) (2.9)

For converting the computed value to a display input value of 0 to 1 the maximum display luminance should be known as well as the display adaption luminance of the viewer so we reach to a scale-factor given by Equation 2.10.

sf = 1 Ldmax 1.219 + (Ldmax 2 ) 0.4 1.219 + L0.4 wa !2.5 (2.10)

Photographic Tone Reproduction

The photographic tone reproduction operator was proposed by Reinhard et al. (2002) and it is inspired by the “dodge and burn” effect used by photographers, being based on the work developed by Adams (1981). The first step of the operator maps the luminance using Equation 2.11 where the Lm refers to the original luminance scaled

(48)

Ld(x) =

Lm(x)(1 + L−2whiteLm(x))

1 + Lm(x)

(2.11)

The second step of this operator is the automatic dodging-and-burning. In photog-raphy, dodging-and-burning is used to give different exposition times to the negative in order to select the brighter and darker regions, giving more or less exposure ac-cording to the photographer’s intentions. The operator behaves similarly to deal with the high dynamic range of the contents by comparing Gaussian filtered images with the regions of the contents where there are no sharp edges. Then, a local map-ping is performed using Equation 2.12 that is similar to Equation 2.11 but where there is the parameter σM ax that in this case corresponds to the largest neighbour-hood around the region.

Ld(x) =

Lm(x)(1 + L−2whiteLm(x))

1 + LσM ax(x)

(2.12)

Calibrated Image Appearance Reproduction Model

The Calibrated Image Appearance Reproduction Model proposed by Reinhard et al. (2012) was developed to consider the distribution of the same contents to a wide range of displays and viewing environments. The main goal of the model is to repro-duce the image appearance for different scenes under certain displays and viewing conditions. For this, the model takes as input an image, the features of the scene (optional parameter as it can be estimated from the image), of the viewing envi-ronment and of the display. To describe these input parameters, a set of unified variables is used, these can be used to describe them individually: adapting lu-minance, maximum lulu-minance, adapting white point, maximum white point and degree of adaptation. Based on the input data, the model processes the content simulating the pathway that light takes from the entrance to the eyes until the pho-toreceptor response by adapting the luminance, applying phopho-toreceptor bleaching adaption and simulating the photoreceptors response. The photoreceptor model used is based on the Michaelis and Menten (1913)’s model .

The final step consists of applying a mapping function to match the image when is viewed in a certain environment and the neural responses. This model can be extended to deal with HDR video applying a leaky integrator (Kiser et al., 2012) in order to remove flicker and to adjust to the variations of the scene features.

(49)

2.2.2

Time-dependent TMOs

Examples of TMOs that take into account temporal coherence are the visual adaption model (Ferwerda et al., 1996), the time-dependent visual adaption (Pat-tanaik et al., 2000), the encoding of HDR video with a model of the human cones (Van Hateren, 2006), the display adaptive TMO (Mantiuk et al., 2008), the spatio-temporal TMO based on a retina model (Benoit et al., 2009), the spatio-temporal co-herency method for video TMOs (Boitard et al., 2012), and the gaze-dependent tone-mapping (Mantiuk and Markowski, 2013).

Visual adaption model

Ferwerda et al. (1996) developed a model of visual adaptation for realistic image synthesis based on psychophysical experiments that considered various HVS char-acteristics such as visibility, visual acuity and colour appearance. This model was developed with the goal of displaying results of global illumination simulations at dif-ferent intensity ranges. This TMO is based on the work by Tumblin and Rushmeier (1993)’s brightness based operator and Ward (1994)’s contrast based operator, com-bining those two TMOs to achieve a more complete model of tone reproduction. The final result is achieved by using TVI functions for a linear combination of photopic and scotopic vision (2.13) to achieve mesopic vision within the HDR content.

      Rd(x) Gd(x) Bd(x)       = mc(Lda, Lwa)       Rw(x) Gw(x) Bw(x)       + mr(Lda, Lwa)       Lw(x) Lw(x) Lw(x)       (2.13)

To model rods (photopic vision), TVI Equation 2.14 that extends Ward (1994)’s model is used, that was derived from the psychophysical experiments results. Simi-larly to model cones (scotopic vision), Equation 2.15 is used.

log10T s(x) =          −2.86 if log10x ≤ −3.94 log10x − 0.395 if log10x ≥ −1.44 (0.405log10x + 1.6)2.18− 2.86 otherwise (2.14)

(50)

log10T p(x) =          −0.72 if log10x ≤ −2.6 log10x − 1.255 if log10x ≥ −1.9 (0.249log10x + 0.65)2.7− 0.72 otherwise (2.15)

The modelling of the cones (mc) is given by Equation 2.16 while for modelling the

rods (mr) is used Equation 2.17. Lda is the luminance display adaptation and Lwa

the world luminance adaptation.

mc(Lda, Lwa) = Ts(Lda) Ts(Lwa) (2.16) mr(Lda, Lwa) = Tp(Lda) Tp(Lwa) (2.17)

Time-Dependent Visual Adaptation

The Time-Dependent Visual Adaptation TMO proposed by Pattanaik et al. (2000) explores the fact that the HVS does not adapt instantly to big changes in luminance intensities. This TMO maps the content taking into account the appearance change of the scene to match the user’s visual responses so it is possible to experience the viewing of a displayed scene as in the real world scene. To achieve it, the first step is to convert the RGB input of the image into luminance values for rods and cones for calculating their retinal response based on Hunt (1995)’s model. The half saturation parameter (σ) is computed as shown in Equation 2.18 for rods and Equation 2.19 for cones. The parameter j is calculated as shown in Equation 2.20 and k on Equation 2.21. The bleaching effect for cones is calculated based on Hunt (1995)’s model.

σrod = 2.5874 × Grod 19000 × j2× G rod+ 0.2615 × (1 − j2)4× G 1 6 rod (2.18) σcone = 12.9223 × Gcone k4× G cone+ 0.171 × (1 − j4)4× G 1 3 cone (2.19) j = 1 5 × 105× G rod+ 1 (2.20)

(51)

k = 1 5 × Gcone+ 1

(2.21) The second step is to calculate the dynamic response that is made firstly by mod-elling the time dependency using exponential filters. The values obtained are then used for simulating the process of pigment depletion rate and the process of pig-mentation regeneration. The third step of the TMO is to apply a visual appearance model in order to compute the appearance values such as luminance appearance or colour appearance. Finally, the inverse of the appearance model and the adaptation model are applied to match the viewing conditions.

Encoding of High Dynamic Range Video with a Model of Human Cones The TMO proposed by Van Hateren (2006) is based on a model of the human cones and it accounts for temporal coherency by presenting temporal filters to handle noise. Essentially, this operator simulates the absorption of the retinal illuminance by visual pigment using two low-pass filters. The first low-pass is expressed by Equation 2.22 where t is a time constant and x and y are the input and the output at the time t. dy dt + 1 τy = 1 τx (2.22)

The second low-pass filter is a non-linear differential equation (Equation 2.23) that produces a strong non-linearity contributing to dynamic range compression and it affects the temporal processing according to the first low-pass filter achieving a dynamic adaptation to the prevailing luminance level.

α = 1

β = (Cβ + kβE)

−1 (2.23)

Display adaptive TMO

The display adaptive TMO developed by Mantiuk et al. (2008) takes into account variables such as environmental luminance levels, peak luminance of the display or even its reflectivity. This allows the TMO to be calibrated by taking into account the environment and output device characteristics in order to optimise the results of

(52)

the tone-mapping process under different conditions. To achieve the dynamic range reduction the TMO compares the visual response of the original HDR frame (that can have an optional step of image enhancement processing in order to improve its appearance) with the display-adapted frame using an HVS model. The final step is the computation of the TMO parameters to minimise the differences between the original image and the frame to display. Its workflow is illustrated on Figure 2.10.

Figure 2.10 – Display adaptive TMO (adapted from (Mantiuk et al., 2008)).

Spatio-Temporal Tone Mapping Operator based on a Retina Model The Spatio-Temporal TMO proposed by Benoit et al. (2009) is based on a model of the retina local adaptation properties developed by Meylan et al. (2007) and is complemented by spatio-temporal filters of the retina. The work is particularly focused on foveal vision and on the parvocellular channel that is responsible for bringing detail and colour information to the central nervous system.

The first step of the TMO is to colour process, followed by two adaptation steps: the photoreceptors local adaption and the ganglion cells local adaption with an outer plexiform layer level filter between the two steps to allow an enhancement of the image details. The final step applies colour demultiplexing that is a processing technique which takes advantage of the photoreceptors colour sampling properties. The scheme of this TMO is presented in Figure 2.11.

(53)

Figure 2.11 – Spatio-temporal TMO based on retina model workflow (adapted from (Benoit et al., 2009)).

Temporal Coherency for Video TMOs

Boitard et al. (2012) proposed a temporal coherency algorithm that can be used in conjunction with time-dependent TMOs as a post-processing technique. This work aims to achieve temporal coherency of the video through the preservation of the overall contrast. To accomplish this, the video and its features are analysed as a whole in order to tone-map each frame of the sequence according to the global prop-erties of the video so temporal coherency is achieved. The technique has two main steps: the video analysis and post processing. The first step is used to compute the characteristics of the video based on HDR luminance values. The second step is post processing designed to preserve the temporal coherency through the preserva-tion of relative levels of brightness throughout the video and to preserve the same relation between the HDR luminance value and the tone-mapped luminance value throughout the LDR sequence. Figure 2.12 illustrates the workflow of the technique.

(54)

Gaze-dependent Tone Mapping

The Gaze-dependent Tone Mapping operator (Mantiuk and Markowski, 2013) is a time-dependent global operator that has a principle process of temporal adaptation of the HVS to varying luminance conditions. This technique uses an eye tracker to find the location of the observer’s gaze and determine the temporal adaptation lumi-nance based on a previously generated lumilumi-nance background adaptation lumilumi-nance map in order to parameterize the global tone mapping that will be applied.

The temporal adaptation luminance is calculated based on an exponential function and for the tone mapping process the modified Naka and Rushton (1966) equa-tion that simulates the perceived brightness in a maladapted state is applied. The processing is done in real time and its workflow is presented in Figure 2.13.

Figure 2.13 – Gaze-dependent Tone mapping workflow (adapted from (Mantiuk and Markowski, 2013)).

2.2.3

Evaluation of HDR tone-mapping

Over the last two decades numerous TMOs were proposed embracing different ap-proaches and theories, thus it is important to understand the strengths of the oper-ators proposed through an evaluation. The methodologies for evaluating TMOs can be divided into two main categories: error metrics and psychophysical experiments.

Imagem

Figure 1.2 – Orders of magnitude afforded by different technologies (adapted from (Krawczyk, 2007))
Figure 1.3 – Mobile device video trends (adapted from (Ooyala, 2014))
Figure 2.1 – HDR pipeline.
Figure 2.4 – Multiple exposures of one scene (shown on the left) combined into one HDR image (shown on the right) (adapted from (Reinhard et al., 2010)).
+7

Referências

Documentos relacionados

Já a especifi cidade (ES) diz respeito ao total de questionários verdadeiro-negativos, dividido pelo número de casos nos quais o exame clínico não encontrou nenhuma restrição

Tais resultados demonstram que para uma amostra coletada após os efeitos do deemed cost, os modelos de três e quatro fatores que utilizam o fator de risco HEMLE não obtiveram

 O consultor deverá informar o cliente, no momento do briefing inicial, de quais as empresas onde não pode efetuar pesquisa do candidato (regra do off-limits). Ou seja, perceber

Desse modo, utilizando como locus de pesquisa a Secretaria de Estado de Educação do Distrito Federal (SEEDF), esse trabalho busca responder a seguinte questão: Como se deu a evolução

Figura 3.30 - Ensaio de posicionamento do controlador RE com ganhos universais ajustados pelo 1º método experimental, movimento de ajuste ,... 3.4.2 Segundo

Segundo o Kindleberger (1979) estas imperfeições de mercado produzem certas vantagens para as empresas, a nível tecnológico, comercial e outros, permitindo assim

A amora e o sabugueiro destacaram-se dos restantes frutos, mostrando ser significativamente diferentes, ao apresentarem os valores mais elevados (78,13 ± 6,91 µmol ác.

Segundo Sophia Andresen (1964), cabe aos professores ajudar nesta revelação infantil e aceitar pacificamente que esta se desenvolva de forma autónoma, por meio de