• Nenhum resultado encontrado

Objective assessment of stereoscopic video quality of 3DTV

N/A
N/A
Protected

Academic year: 2023

Share "Objective assessment of stereoscopic video quality of 3DTV"

Copied!
256
0
0

Texto

La première partie de la thèse étudie l'attention du système visuel humain. La première partie de cette thèse examine l'importance de l'attention visuelle dans la conception d'une métrique objective de qualité 3D.

Introduction

Principles of depth perception

  • Oculomotor cues
  • Monocular depth cues
  • Binocular depth cues
  • Depth cue interactions
  • Individual differences

The disparity-selective cells or binocular depth cells were revealed in the visual cortex of the brain [Barlow et al., 1967, Hubel and Wiesel, 1970]. The properties of the HVS change with age due to structural changes in the eyes.

The simple stereoscopic imaging system

Perceived depth as function of viewing distance

From equation 1.2, the perceived depth depends on the viewing distance as illustrated in Figure 1.10.

Perceived depth as function of screen disparities

For example, the screen parallax should be adapted to display content made for the cinema screen on a 3DTV.

Artifacts related to S3D visualization

  • Cardboard effect
  • Size distortion artifacts
  • Window violation
  • Stereoanomaly

Bias can be avoided by correctly representing binocular disparity, which also applies to the cardboard effect [Yamanoue et al., 2000]. It occurs when crossed disparity objects are cut off by the edge of the screen [Mendiburu, 2009, Devernay and Beardsley, 2010].

Limits of the HVS in binocular depth perception

Fusion range limits

A fusion limit of 3.5 minutes of arc of vertical disparity for the central region of the stereogram has been reported [Nielsen and Poggio, 1984]. The limit of the smallest angle of binocular disparity that can be detected by the HVS is called stereoscopic acuity or stereoacuity.

Accommodation and vergence limits

For example, for a certain amount of vergence, accommodation remains within the DoF of the eye. As mentioned above, ZoC boundaries are often related to the extent of the reconstructed depth range.

Extreme convergence and divergence

Chen summarized most of the proposed limits of the comfortable viewing zone and plotted them as a function of the viewing distance as illustrated in Figure 1.16 [Chen et al., 2011]. This threshold was confirmed with a subjective experiment, which demonstrated that visual comfort decreased beyond this limit.

Conclusions

Controlling the perceived depth with a dual-camera configuration . 41

The cameras converge in front of the car - the scene is seen behind the display plane. The cameras converge behind the ice - the ice pops out of the display while the car remains behind the display plane.

Displays with color multiplexed approach

Finally, Gunnewiek and Vandewalle presented mathematical methods for the realistic display of stereoscopic content that take into account the camera parameters of the original content and the viewing conditions [Gunnewiek and Vandewalle, 2010]. It was recommended to use the same ratio between the focal length of the camera and the width of the sensor as between the viewing distance and the width of the screen.

Displays with polarization multiplexed approach

Displays with time multiplexed approach

3D visualization requires the viewer to wear active glasses, which are synchronized with the screen using infrared or radio commands. In a movie theater, an infrared emitter usually synchronizes the active shutter glasses with the stereoscopic content provided by the projector.

Head Mounted Displays (HMD)

The glasses are composed of thin LCD films that prevent one image from reaching the other for a certain period of time. Compared to polarized glasses or color glasses, this solution is more expensive to produce, requires a battery and is heavier to carry.

Autostereoscopic displays

Autostereoscopic displays are free of glasses, which is quite convenient for end users, but the main disadvantage of parallax barrier and lenticular sheet is that incorrect positioning in front of the display can lead to a pseudoscopic image (see section 1.3.3.4) or loss of depth perception. Loss of time frequency, loss of luminance, crosstalk, loss of synchronization, flicker, heavy glasses compared to polarized resolution.

Visualization artifacts

The visibility of flicker does not depend on the capture rate, but only on the presentation rate [Hoffman et al., 2011]. Right from top: the first, second and last layers from [Cagnazzo et al., 2013].

Coding, transmission and related artifacts

Fewer views are required compared to MVD since a depth map is available. Finally, asymmetric stereo coding can generate depth distortion when the quality discrepancy between left and right view is too large and cannot be compensated by HVS [Boev et al., 2008].

Conclusions

At this stage of the transmission chain, the cardboard effect (see Section 1.3.3.1) can appear due to depth map coding with high quantization levels. Then, in 2012, the European Network of Excellence “Qualinet” defined QoE as “the degree of user satisfaction or annoyance of an application or service.

Components influencing 3D video QoE

  • Picture quality
  • Depth quality
  • Visual (dis)comfort and visual fatigue
  • Additional perception dimensions

Furthermore, comfort is when there is "satisfaction with the visual environment". part of the definition of Walter Grondzik [Grondzik,]). Such a definition is related to the disproportions of the objects' shapes in space, which are responsible for an unreal or unnatural appearance.

Models of 3D QoE

This issue was addressed in a subjective study by Chen et al., who evaluated visual comfort, visual experience, and depth rendering of stereoscopic synthetic images [Chen et al., 2011]. Similar to Lambooij et al., they assumed that high levels of 3D QoE can be represented by a weighted sum of 2D image quality (IQ) and depth quantity (DQ).

Subjective assessment methods of 3D QoE

Assessment of visual discomfort and fatigue

  • Measurement of the visual discomfort associated with
  • Measurement of the discomfort associated with view asym-

Moreover, eye blink rate was proposed as an indicator of visual fatigue by Stern et al. The AC/C ratio is the amount of accommodative convergence (AC) per unit of accommodative response (A) (accommodation can still be stimulated by covering one eye; the closed eye still converges induced by pooling of responses) [Ukai et al., 2000 ];.

Objective assessment methods of 3D QoE

Including depth attribute

Finally, a match of two simple cells from the left and right view is confirmed if this combination reaches the maximum dicular energy. The resulting binocular energy quality metric (BEQM) is calculated as the difference between the binocular energy of the original pair and the degraded pair.

Including comfort attribute

A preliminary subjective test identified viewing location as an insignificant factor in 3D-video QoE, while different combinations of content, baseline, and screen size were significant. Of the perceptual issues, only crosstalk was discussed in another work by the same authors [Xing et al., 2010c].

Conclusions

Neither concept takes into account the depth component in the form of stereoscopic distortions (eg magnification/miniaturization of object dimensions and stretching/compression of depth). The state-of-the-art objective metrics in Section 3.6.1 evaluate the quality of the signal without considering the involved perception of depth and are similar to 2D metrics regarding spatial distortions.

Visual attention and eye movements

This attention can be voluntarily focused on a peripheral part of the visual field and it is the act of mentally focusing on one of several possible sensory stimuli. Most of the studies deal with overt visual attention, which can be measured with eye tracking.

Bottom-up and top-down processes

Natural visual scenes are cluttered and contain many different objects that cannot all be processed simultaneously due to the limited capacity of the HVS [Chun et al., 2011]. Top-down attention (also endogenous or goal-driven attention) is driven by "high-level" information, such as current task, knowledge and expectations.

Eye-tracking

It is then possible to calculate the number of pixels per degree of viewing angle by dividing the horizontal resolution of a screen by ΘW, obtained by equation 4.2. A heat map, illustrated in Figure 4.4.d, consists of the stimulus (Figure 4.4.a) as a background image and a hotspot mask superimposed on it.

Studies of visual attention in S3D

Stimuli: still stereoscopic images

Jansen et al., 2009] studied the influence of disparity on fixations and saccades in the free viewing of 2D and 3D images of natural scenes, pink noise and white noise. Czuni and Kiss analyzed differences in the distribution of fixation points in 3D conditions compared to 2D [Czuni and Kiss, 2012].

Stimuli: stereoscopic videos

During the eye-tracking experiment, fixation points were collected in mono and stereo conditions from 66 images. By examining image contours, depth contours, disparity changes between fixation points and the clustering of fixation points, only minor differences were found in the special distribution of fixation points.

Analysis of eye movements with state-of-the-art studies

For fixations: number of fixations and duration of fixation or frequency of fixation. Due to the lack of standards in eye-tracking studies, it is quite difficult to determine which indicator is the most representative for comparing eye movements.

Conclusions

No studies have been conducted to investigate whether eye movements are affected by visual discomfort. In addition, we explore the impact of visual disturbance caused by excessive disparities in visual attention.

Experiment 1: simple visual stimuli

Stimuli generation

The figures are drawn considering that an observer is located in front of the display plane. The name of the stimulus consists of the corresponding number for sphere configuration and the designation of.

Experimental set-up and methodology

Calibration: the eye tracker requires some calibration to learn the characteristics of the eyes of each observer. During the first step of calibration stage, an observer simply looked at a dot that appeared in different positions of the screen.

Eye-tracking data analysis

  • Influence of depth on visual attention
  • Influence of texture on visual attention
  • Influence of the position of the spheres on test results
  • Saccade length and fixation duration
  • Discussion and conclusions

Spheres emerging from the screen with the checkerboard texture had the highest selection priority. The introduction of depth (z-axis) increased fixation duration, regardless of the type of disparity.

Experiment 2: complex stimuli with only uncrossed disparity objects

Stimuli generation

Camera parameters were selected to correspond to DoF = 0.1 diopter for the comfortable condition and DoF = 0.3 diopter for the uncomfortable condition. A detailed analysis of the relationships between camera space and visualization space and depth deformations for the comfortable condition and the uncomfortable condition is given in Appendix A, Figures A.7-A.12.

Experimental set-up and methodology

DR is the disparity range of a scene in the visualization space, which consists of the maximum crossed and uncrossed disparity in mm on the display used for the experiment. 9 sets containing 6 images of different content were formed to prevent observers from memorizing the images and thus using top-down visual mechanisms.

Eye-tracking data analysis

  • Qualitative analysis based on heat maps
  • Quantitative analysis
  • Saccade length and fixation duration
  • Influence of depth on visual attention
  • Influence of texture on visual attention
  • Discussion and conclusions

The mean decrease in saccade length over time was calculated as the difference between the saccade length of the first time interval and the last. Nevertheless, clear evidence of this tendency was not revealed for the rest of the scenes.

Experiment 3: complex stimuli with crossed disparity objects

Stimuli generation

The key point of the experiment is to present stimuli with an object(s) before a display with a controlled amount of depth to the observer. A detailed analysis of the relationships between the camera space and the visualization space and depth distortions for the comfortable condition and the uncomfortable condition is given in Appendix A Figures A.13-A.16.

Experimental set-up and methodology

Images were rendered at a resolution of 1920 × 1080 using a virtual camera with a sensor size of 32 mm × 16 mm. The space outside the ZoC is marked in light gray and the area of ​​interest as a magenta line.

Eye-tracking data analysis

  • Qualitative analysis based on heat maps
  • Quantitative analysis
  • Saccade length and fixation duration
  • Discussion and conclusions

As a result, the saliency maps differ mainly in the density of fixations, which cannot be detected by the AUC and CC metrics. As a result, the saliency maps differed mainly in the density of fixations, which cannot be detected by the AUC and CC metrics.

Weighted Depth Saliency Metric proposal for comparison of visual attention 122

Results

Cartoon" scene, while in the "Hallway" scene, more attention is paid to the background, while in the "Tea" scene, the foreground and the area of ​​interest attracted almost the same level of attention. This effect is clearly seen by comparing the “Cartoon” scene with the crossed-out disparity plane in Figure 5.37.a and the similar scene in Figure 5.34.b.

Conclusions

However, real-time services require objective metrics that are able to predict and monitor video quality on the fly. Also these objective metrics should be able to guarantee a certain level of quality of the video provided to the end users.

Background and motivation

Visual discomfort due to the vergence-accommodation conflict or viewing asymmetries is only a typical problem of 3D systems. Therefore, a further description of the block “Framework to predict 3D QoE” is done for the basic perceptual attribute “Visual comfort”, excluding visual fatigue.

Objective model proposition

  • Definition of objective categories
  • Subjective color scale proposition
    • Color Scale decomposition
  • Definition of the boundaries of objective categories
  • Proposal of Objective Perceptual State Model (OPSM)
  • OPSM validation with subjective experiments
  • Aggregation of technical quality parameters
  • Acceptability and annoyance thresholds comparison

Still acceptability is a high-level concept and can also be considered "the result of a decision which is partly based on the quality of experience". In the case of visual discomfort, apparently if at least one of the categories is "Red", the overall quality should be in the "Red" category; then, if at least one of the categories is "Orange", the overall quality must be in the "Orange" category; otherwise, it should be "Green".

Conclusions

In essence, threshold comparison is a comparison of the level of distortion of the relevant technical quality parameters. After objectively measuring 3D technical quality parameters and comparing them with perceptual thresholds, it would be possible to predict the elicited perceptual state that would reflect the viewer's categorical judgment based on the acceptability of the stimulus and the visual disturbance caused.

OPSM metric validation. “Color Scale” experiment

  • Stimuli generation
  • Experimental set-up and methodology
  • Using the Color Scale for thresholds estimation. “Color Scale”
  • Result analysis of the “Color Scale” experiment

Idist =M AGN IF Y(Iorigin,100 +x) (7.4) where Idist – the distorted image, Iorigin – the original image, and x – the distortion level as a percentage of the height and width of the original image. Ldist(RGB) =Lorigin(RGB)×(1−x) (7.6) where,Ldist– the distorted luminance value of the image,Lorigin– the original luminance value of the image, and x – the distortion level as a percentage of the color channels.

Thresholds comparison

T annIS(3.5) > T annCS(1.5): the annoyance thresholds obtained with the limitation scale (IS) represent higher degradation levels than those obtained with the color scale (CS) for all types of asymmetries except green. T accChen< T accCS(0.5): the acceptability thresholds obtained with Chen's method are more stringent than those obtained with the color scale (CS) for all types of asymmetries.

Methodology development. “Double Scale” experiment

Result analysis of “Double Scale” experiment

Interestingly, in the case of the Color scale experiment for the same effect of the content of a scene, only the rotation asymmetry was found to be significant. This was confirmed with an F test (two samples for variances): the difference in MOS scores between the datasets of the Color scale and Double scale experiments is insignificant for all five view asymmetries.

Comparison of Color Scale with Acceptability and Impairment Scales

A degradation level representing 20% ​​acceptability on the AS corresponds to 71% acceptability on the CS, while the visual annoyance on the CS is 90%. Therefore, acceptability on the color scale means the acceptance of a certain level of visual annoyance, e.g.

Conclusions

The threshold of 80% acceptability on the AS can be used to construct a simplified OPSM metric taking into account visual comfort, consisting only of the "Green" and. This is possible because Tacc(80%) was found to be equal to the 45% visual annoyance threshold on the CS.

OPSM metric verification with S3D videos

Stimuli generation

The "kitchen" scene in Figure 8.4 is compressed in the visualization space compared to the camera space. The shape distortion of the object of interest (Ds) is changed from 0.5 to 0.7, e.g. became less compressed after zooming in the visualization space.

Experimental set-up and methodology

The first part of the experiment consisted of 2 tests, where observers assessed 2 types of view asymmetries. To avoid an accumulation of visual discomfort, observers evaluated the second part of the experiment with the remaining 2 asymmetries after a 15-minute break.

Result analysis

  • Stereoscopic video versus images: thresholds comparison 180

These plots allow direct comparison between subjective results from the “Color Scale” experiment (see Section 7.2.4) and objective prediction as explained in Section 6.3.5. The MOS that do not match objective predictions fall outside the boundaries of the color rectangles.

Aggregation of technical quality parameters

Stimuli generation

The distortion levels at the center of each color category for image asymmetries are shown in Table 8.6 below. Only the green level distortion corresponding to the center of the “Orange” category was applied to create this restriction.

Result analysis

  • Agregation of green and vertical shift asymmetries
  • Agregation of focal and vertical shift asymmetries

Where R, O and G are the distortion levels for green level and vertical shift asymmetries from Table 8.6. The weights of the predicted scores were normalized to sum to one for the green and vertical shift asymmetries.

Conclusions

Furthermore, the quality of perceived depth is based on the absence of visual artifacts. It was found that the judgments about the aggregations of geometric asymmetries were largely based on vertical shift.

Area Under Curve (AUC)

A value of 0 indicates no linear correlation between two maps and 1 - perfect correlation.

Measuring the inter-observer congruency (IOVC)

Camera space z versus visualization space Z. Stereoscopic distortions in

CC, AUC, IOVC data

Results: depth metric

Acceptability experiment

Images: Doubles Scale experiment

Videos: Color Scale experiment

Referências

Documentos relacionados

Ms Fernando Sanches IDT/UFRJ Mesa Redonda 10:00h – 10:45h Nível Central em Ação Controle de Infecção em tuberculose – estratégias prioritárias Garantir biossegurança mesmo com