Research Summary and Findings - Leveraging eXtented Reality & Human-Computer Interaction for Us

Research onXRsystems has been ongoing since the early days of computing [Sut68], and recent advances have pushed this research outside of the research laboratory and into the hands of practitioners and consumers. However, barriers for both practitioners [Ash+20]

and users [RO16] still persist. The research presented in this thesis was motivated by a need to improve and augment the experience of watching 360º video, by leveraging previousXRresearch knowledge.

Based on previous work and challenges inHCI,Perceptual systemsandXR/VR/360º videosystems (described in section2.4), two areas for action in360º videowere deter-mined. While other areas of intervention exist, the two areas chosen were based on their potential regarding the user experience for360º videoand previous work withThe Old PharmacyXRiterations (described in section5.1). Based on existing related work (chap-ter 3 and4), research questions were determined for each of the two areas, Attention GuidanceandVisually Induced Motion Sickness (VIMS); the rationale for these research questions can be found in sections3.3and4.3respectively.

This research work relies on the construction ofartifactsthat support the exploration of the research questions (shown in table 1.2). As such the main contribution of this research is as an artifact (acording Wobbrock and Kientz’s contribution types in HCI [WK16]), since its describes systems and techniques developed, and isconstructive (ac-cording to Oulasvirta and Hornbæk’s types of research withinHCI[OH16]), as it provides knowledge through the construction of an interactiveartifact. Furthermore, it adresses problems/solutions that have already been identified, making it aPartial, ineffective, or inefficient solutionsubtype (adopting [OH16]).

9 . 1 . R E S E A R C H S U M M A RY A N D F I N D I N G S

The main contribution of this thesis encompasses threeartifactssystems:

• IVRUX(described in section5.2) is a tool for the analysis of immersiveVR story-driven experiences. Iterations ofIVRUXwere developed in parallel toXRiterations ofThe Old Pharmacy, and as such, express different design concerns (e.g., a focus on orientation in the first iteration versus a focus on location and orientation in the second iteration). Overall,IVRUXenables representation of several concepts of the Trajectoriesframework [Ben+09; BG08; BG11], such asCanonical trajectorieswith moving spheres representingPOIs in the story, andParticipanttrajectorieswith the representation of users orientation.

Work inIVRUX, alongside secondary research (PartI), served as a development base for the following twoartifactssystems.

• Cue Control(described in section6.2) is a tool for cueing diegetic and non-diegetic sounds in360º videos, both in space and time. Cue Control, through theCue Spa-tializer, maps virtual audio sources to a start and end position of a sphere, and over time, moves it along the path according to the orthodromic distance.

Considering360º videos to be performance as done by Jaller and Stefania [JS20], Cue Controlexpresses several concepts of theTrajectoriesframework [Ben+09;BG08;

BG11]. TheCue Spatializerallows the user ofCue Controlto specifiy theCanonical trajectoriesthey desire for the viewer. Cue Control, through aCue Playbackplugin, not only applies theCanonical trajectory, but also records theParticipanttrajectory.

Cue Spatializercan then use data from both trajectories for analysis.

Cue Control utilizes a design theme of RBI [Jac+08] by mimicking Environment Awarenessand modeling sound sources to how they might behave in the real world.

This spatialization of audio tracks might not correspond in its entirety to the user’s real sense knowledge of the physical world; however, the trade-off of applying is that it allows for users to accomplish the task (following POIs) more efficiently.

Additionally, it brings benefits in terms of practicality, with practitioners being able to create spatial cue tracks. The cost of this solution is accessibility, as users with hearing impairments or low spatial sound localization skills will not reap any benefit (nor suffer negatively from its use).

• VIMSmitigation pipeline using optical flow(described in sections 7.2and8.2) is a linear sequence of modules for the application ofVIMS-mitigating techniques.

The first iteration of this pipeline (see fig. 7.1) uses the pre-computed aggregated optical flow (using a Gunnar Farnebäck algorithm). During runtime, the aggregated optical flow is polled based on user orientation to determine the parameters for VIMS-mitigating techniques.

The second iteration of the pipeline (see fig. 8.1) uses a precomputed automatic movement classification based on OpenVSLAM’s tracking, alongside optical flow.

During runtime, if the camera is moving, the aggregated peripheral optical flow is polled based on user orientation to determine the parameters forVIMS-mitigating techniques.

The application ofVIMS-mitigating techniques like restrictedFoVgoes against the central tenet ofRBI[Jac+08]. However, the ergonomic trade-offof allowing users to safely experience360º videois needed. Ourartifactpipeline follows this trade-off but tries to reduce its application to when it is necessary while still being effective in solving the essential aspects of the problem.

Through theseartifactsand considering the research questions posited (in table1.2), several evaluation studies were conducted. While they cover different areas of interven-tion and different experimental designs, these studies shared methodological considera-tions that evolved along with the research:

• While these studies used self-reported validated measures such asIPQandSSQ, it became apparent from early on the need to observeparticipants. Self-reported mea-sures are essential to understand the effectiveness of theartifact, but they may also obscure interaction. For example, looking at movement in studies in7and8helped to understand the effects ofVIMS-mitigating techniques and to contextualizeSSQ scores. In addition, a mixed-methods approach was beneficial to get feedback on the user experience through semi-structured interviews. For example, in the first study onVIMS, due to time constraints of the session, we opted to include an op-tional comment field in the form instead of a semi-structured interview after the session. This led to a limited set of comments, focused on negative aspects of the experience. For the follow-up study onVIMS, a semi-structured interview allowed for a better understanding of the applied interventions, by discussing positive and negative aspects withparticipants.

• The procedure for observingparticipants has evolved through the studies to be more detailed (with more data), automatic (with less human interference) and ob-jective (with a focus on significance).IVRUXrepresentedparticipantsthrough their trajectories and required visual analysis from the experimenter.Cue Control repre-sentedparticipantsthrough their trajectories and through hotspots that required visual analysis. After this visual analysis, certain timeframes were identified and data was exported to a spatial statistics package [QGI18]. In this software, the process of making (statistically significant) hotspots, as seen in fig. E.3 andE.4, was manual and time-consuming, limiting the extent and detail of the video anal-ysis. Studies on VIMSusedCue Controlfor visualizing participants’s trajectories and hotspots, but automatize the process of making hotspots; this allowed for the creation of videos (in appendix D.3andE.2), and polling of specific timeframes.

9 . 1 . R E S E A R C H S U M M A RY A N D F I N D I N G S

Furthermore, these studies analyzed the movement of theparticipantswith speci-fied timeframes (defined manually in the first iteration of the pipeline and defined manually or automatically in the second iteration of the pipeline).

• This evolution of observingparticipantshas been enabled as well by a move from WIMPstatistical packages (like SPSS [IBM17] and QGIS [QGI18]) to scripts in R lan-guage [R C14]. The increasing amount of data from studies (from self-reported mea-sures to objective meamea-sures) demanded a higher level of detailed preparation and analysis of data. Using scripts allowed for specifying timeframe intervals and/or subpopulations, and automatically rerunning all analyses. This could be accom-plished manually, but the ease of doing this through scripts allowed for faster pro-cessing of high volumes of data. This is particularly relevant for studies onVIMS that required individual and combined analysis on manually/automatically classi-fied intervals and several subpopulations of the sample. Other advantages for this approach were the inclusion of data from other sources (e.g., optical flow values of the video, FoV size during the experience) and the creation of higher-quality graphics and videos.

The research questions contributing to the empirical contribution of this work are discussed in detail in their own chapters (RQ1in section6.5.2,RQ2in section6.5.3,RQ3 in section7.7.1and section8.5, andRQ4in section7.7.2). I present below a summary of these research findings.

RQ1-Can spatialized music guide viewers’ attention in360º video?

Music can be spatialized to guide viewers’ attention in360º video. Participants ex-periencingfull spatializationwere able to use the spatial manipulation of music to locate POIs. However, the separation of audio tracks (partial spatialization) was revealed to be ineffective. We posit two factors for such a result: (1) presence of visual elements that are better at guiding attention than music, taking into account that the main task (watching a360º video) is visual, and (2) changes inpartial spatializationnot being strong enough to be noticeable by a mostly non-musical population.

RQ2-Can the use of diegetic cues be reinforced by the audio spatialization to guide viewers in 360º video?

Our results showed that the spatialization of diegetic cues itself (partial spatialization) does not introduce significant benefits in the orientation, while the combination of music and diegetic cues (full spatialization) reveals to be more effective.

Inpartial spatialization, our results indicate thatparticipantsused diegetic cues for notification (e.g., knowing that there is a elephant) but not for orientation. This corre-sponds to Everyday Listening, where a person focuses on the object producing the sound, and not the characteristics of the sound (such as spatial location).

Infull spatialization, both Everyday Listening and Musical Listening supported par-ticipants’s experience in360º video: diegetic cues for notification and spatial music for orientation.

RQ3-Can optical flow of a360º videobe used to mitigateVIMS?

Aggregating results from both VIMS studies, optical flow is able to contextualize and identify moments with potential extreme motion, but its usage forVIMSmitigation depends on the type of video and the behavior ofparticipants.

In the study in chapter7, when there was extreme motion,participantswere focused on "staying on track", being exposed to less optical noise and affecting the use of VIMS-mitigating techniques. Nevertheless, results when using a restrictedFoVcontrolled by the optical flow were promising as it did present lower scores for SSQ and allowed a female subpopulation to explore more.

Addressing limitations of the first study (e.g., video type, duration) and limitations of the pipeline (e.g., optical flow when the camera was static), the study in chapter 8 used the second iteration of the pipeline to answer RQ3. Dynamic-FoV was shown to be statistically effective in reducing/maintaining scores of SSQ-TS, SSQ-N, VRSQ-TS, andVRSQ-Dwhen compared tounrestricted-FoV, for both the full population and upper 75th percentile subpopulation. Aligning optical flow with visualSLAMwas effective in mitigatingVIMSwhile maintaining Presence.

It is of note that in both studies, the noise produced by a contractingFoVmight be detrimental to the experience and cause distraction or discomfort. In both pipelines, we addressed this by dampening values, but this will still be dependent on theparticipant’s sensitivity. Likewise, thisartifactdoes not take into account the optical flow caused by theparticipant’s movement. Both factors could be addressed in future implementations.

RQ4-Can the combination of differentVIMS-mitigating techniques forVRbe sucessful in 360º video?

Results from the study in chapter7indicate that combining anIVBwith a restricted FoVwas not beneficial. In fact, it may be counterproductive as it presented low Presence scores and increasedSSQscores. We posit that the complex combination of techniques may have caused a sensory conflict based on results for the upper 75th percentile sub-population. Based on our results, restricted FoV was the preferred VIMS-mitigating technique, and was therefore chosen for the follow-up study.

No documento Leveraging eXtented Reality & Human-Computer Interaction for User Experi- ence in 360◦ Video (páginas 164-169)