• Nenhum resultado encontrado

The role of sound in inducing storytelling in immersive environments

N/A
N/A
Protected

Academic year: 2021

Share "The role of sound in inducing storytelling in immersive environments"

Copied!
8
0
0

Texto

(1)

environments.

Inês Salselas

FEUP / CITAR Porto, Portugal isalselas@fe.up.pt

Rui Penha

ESMAE / INESC TEC

Porto, Portugal ruipenha@esmae.ipp.pt

ABSTRACT

Sound design has been a fundamental component of audiovisual storytelling. However, with technological developments things are rapidly changing. More sensory information is available and, at the same time, the user is gaining agency upon the narrative, be-ing offered the possibility of navigatbe-ing or makbe-ing other decisions. These new characteristics of immersive environments bring new challenges to storytelling in interactive narratives and require new strategies and techniques for audiovisual narrative progression. Can technology offer an immersive environment where the user has the sensation of agency, of choice, where her actions are not mediated by evident controls but subliminally induced in a way that it is ensured that a narrative is being followed? Can sound be a subliminal element that induces attentional focus on the most relevant elements for the narrative, inducing storytelling and bias-ing search in an audiovisual immersive environment? Herein, we present a literature review that has been guided by this prospect. With these questions in view, we present our exploration process in finding possible answers and potential solution paths. We point out that consistency, in terms coherency across sensory modalities and emotional matching may be a critical aspect.

CCS CONCEPTS

• Applied computing → Sound and music computing;

KEYWORDS

Audiovisual Attention, Cross-modal Perception, Sound

ACM Reference format:

Inês Salselas and Rui Penha. 2019. The role of sound in inducing storytelling in immersive environments.. In Proceedings of Audio Mostly, Nottingham, United Kingdom, September 18–20, 2019 (AM’19),8 pages.

https://doi.org/10.1145/3356590.3356619

1

INTRODUCTION

Sound design has historically been a major component of audio-visual storytelling, enabling factual and emotional contexts to be delivered to the audience, both in an explicit and in a subliminal way. In an immersive audiovisual environment, the role of the user

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.

AM’19, September 18–20, 2019, Nottingham, United Kingdom © 2019 Association for Computing Machinery.

ACM ISBN 978-1-4503-7297-8/19/09...$15.00 https://doi.org/10.1145/3356590.3356619

has changed. The passive position takes no longer place, as the user gains agency and can make decisions on whether to look at, what plan or character to follow, etc... This user-dependent framing presents added challenges to traditional storytelling and audiovi-sual narrative progression techniques. In this context, how can one guide the users to focus their attention on the most relevant elements for the narrative? Is it possible that sound may induce storytelling and bias search in an audiovisual immersive environ-ment?

In this article, we present a literature review that aims to build a conceptual background and explore the hypothesis that audi-tory stimuli can alter and enhance the perception of visual events, enhancing perceptual distinctiveness and increasing saliency of events, biasing attention and, thus, storytelling, in the situation of an audiovisual immersive environment.

Sound has already been used as an orienting element in im-mersive environments. However, this has mainly been done in an explicit way, using non-diegetic sounds to attract user’s attention (e.g., alert tones). This strategy exposes controls and consequently weakens the immersive experience by reducing the degree of in-volvement in the virtual environment [39]. For this reason, we aim to explore strategies in which sound can orient and guide users in a subliminal way, becoming an invisible agent for controlling orienting and navigation.

Moreover, we perceive the world in a multimodal manner. In the same way, navigating in immersive virtual environments requires the coordination of information coming simultaneously from mul-tiple sensory modalities. Consequently, further than multimodal, we will explore cross-modal possibilities where audiovisual stimuli potentiate each other, more specifically, where sound biases spa-tial attention in a subliminal fashion, enhancing the perception of visual events.

We first investigate the experience of immersion, its manifesta-tions within a sound medium and the relamanifesta-tionship with attention. The experience of immersion is not specific to a medium. There-fore, whether in a virtual environment or not, we investigate the transversal aspects that contribute to and modulate the sensation of immersion within a narrative or experience in order to reach a broader view on the phenomena. In particular, we explore the role that sound can assume in contributing to this experience. Sound as the agent that builds an emotional relationship between the user and the virtual environment, through a subliminal manipulation of perception and cognition.

Furthermore, attention is a fundamental concept when consider-ing inducconsider-ing the trackconsider-ing of a narrative, particularly, spatial atten-tion. Attention shapes our expectations and perceptual mechanisms since it provides us the selection and prioritization of important

(2)

and meaningful information among the overload that is constantly surrounding us. We, then, highlight the cross-modal characteristic operating mode of perception in spatial attention.

We will also explore audiovisual links and dependencies in their perceptual processing, supporting an audiovisual integration ap-proach of attention. The potential of auditory stimuli as an inducing element of visual attention that biases spatial attention will be fur-ther examined.

Additionally, we consider emotion as an element present in infor-mation of high attentional saliency that may produce a modulation on attention. Emotional engagement can be achieved itself through sound and lead thus to a deeper sense of presence in the virtual environment, transmitting a greater sense of immersion.

Finally, in discussion, we give an overview of the gathered infor-mation and conclude that sound may have the role of facilitating the immersive experience, orient an user through a narrative with the sensation of agency and, at the same time, dissolving the acknowl-edgement of the existence of a medium in the virtual environment through emotional engagement and the decreasing of critical dis-tance. We point out that consistency, in terms coherency across sensory modalities and emotional matching may be a critical aspect. Consistency as a salient event that captures and drives attention and as the element that facilitates an emotional engagement.

2

CHARACTERIZING IMMERSION

The experience of reading a book or watching a movie can convey the feeling of being transported to a fictional world. Immersion, whether it emerges from a real situation or conveyed by a virtual environment, whether we are referring to a technology, to the interaction with it or to a subjective experience, is related with the sensation of involvement within a certain narrative or experience [48]. Thus, we refer to immersion as a cognitive state of engagement independent of the medium in which it is experienced.

"Immersion is a metaphorical term derived from the physical experience of being submerged in water. We seek the same feeling from a psychologically immersive experience that we do from a plunge in the ocean or swimming pool: the sensation of being surrounded by a completely other reality, as different as water is from air, that takes over all of our attention, our whole perceptual apparatus. We enjoy the movement out of our familiar world, the feeling of alertness that comes from being in this new place, and the delight that comes from learning to move within it." [47, p. 150] This sensation is possible due to the capacity humans have, despite the medium, to dive or submerge into a story, to disconnect from the external world and be involved in the narrative with such an intensity that it is experienced as a simulated place, an alternative (virtual) reality. This natural drive described once as "the willing suspension of disbelief" by Samuel Coleridge, describes our ability, whether in books, movies or virtual reality, to devote our attention and perceptual resources to reinforce rather than to question this alternative reality [47].

Because immersion is not constrained to one medium, the term has been applied in a variety of domains such as in gamming [1, 12, 45], virtual environments [62, 71], film studies [59, 69] and in music [24, 36].

However, the usage of the term in different fields may be referring to slight different meanings and explored in different approaches. The concept has been subject of several studies that aim to under-stand and clarify the meaning of the term and its usage in research in the context of the digital media [1, 5, 26, 45, 48]. Nilsson et al. [48], in their exhaustive literature review, provide an overview on the existing definitions of immersion and suggest a taxonomy that interconnects the existing conceptualizations of the term divided into three categories: (1) Immersion as a property of a technology that mediates an experience [62]; (2) Immersion as a response to an unfolding narrative [5, 45, 60], that can be subdivided into spatial, temporal and emotional immersion; (3) Immersion as a response to challenges that may involve the use of sensorimotor skills or the intellect of the user[5, 45].

While the first dimension of the taxonomy relates to an objective measurable property of a technology, the remaining two point to a subjective state of an user, expressed by an attentional focus on the virtual environment related with a detachment from the real world. In this paper, the use of the term immersion is intended to refer to the cognitive experience of engagement when in a virtual environ-ment (interacting with technology). The feeling of being environ-mentally absorbed, which can be also felt when listening to music, triggers a process of transformation from one mental state into another. This mental change can be characterized in terms of its intervention in perception and cognition, by weakening critical distance and increasing emotional involvement to what is being experienced [31]. Critical distance is here referred in the sense of the dissolution of the mediation between the user and the medium, a fusion with the medium that affects sensory impression and awareness and is achieved through a subliminal perceptual manipulation that leads to a greater emotional involvement.

This state of immersion involves sustaining the paradoxical bal-ance between keeping âĂIJthe virtual world "real" by keeping it "not there""[47, p. 151–152]. This delicate equilibrium is thus eas-ily disrupted as for it requires constant vigilance for consistency and detail, and a careful regulation of the borderline between the imaginary and the real.

This fragility of the immersive state has been mostly approached by classical narrative art forms (i.e. books, cinema, theatre) by re-straining participation [47]. However, technological developments brought changes with them. Fictional worlds have been intensified in digital environments. Immersion became a fundamental feature, a key aesthetic value, and digital immersive experiences’ design looks for to incorporate the strongest impression possible of being in the illusionary reality but also the sense of agency, the sense that meaningful actions unroll tangible result. With navigable spaces and the possibility of moving within them in an exploratory mode, the chance to make choices brought to the observer an agency upon the medium and upon the narrative.

However, the power that the interactor gains in making decisions brings new challenges to storytelling in interactive narratives. How can technology offer an immersive environment where the user has the sensation of agency, of choice, where her actions are not mediated by evident controls, i. e. without disrupting the cognitive state of being mentally absorbed but, at the same time, ensuring that a story is being told as requested by the storyteller, following

(3)

a certain narrative? This question represents the driving force that guides the present review.

2.1

Immersion and sound

Sound travels through air in which we live submersed. "We live immersed in a vast but invisible ocean of air that surrounds us and permeates us." [36, p. 3]. Sound is thus invasive. Our ears give us the location and detail of an auditory event, serving as a focal element, although we hear within the entire body. We listen immersed in sound [24].

These physical characteristics could, as such, be sufficient to define the experience of listening as an immersive one. However, in a virtual environment, sound, if not fulfilling a few criteria, may even become a disruptive element of the immersive experience, when it exposes controls of the environments through explicit sounds such as alert sounds [39].

In his essay, Albert [2] analyses two contemporary audiovisual works representative of different expressive forms (a movie and an audiovisual installation) and observes that surrounding the user with multiple sensory stimuli may not be sufficient for a work to be immersive. Immersion emerges, thus, due to the relationship between viewer and the work mediated based on subliminal manip-ulation of the viewer’s perception. From the viewer’s perspective, the media’s devices disappear to their perception, due to the fact that controls are not exposed and consequently mediation between technology and user is dissolved.

Human beings live in acoustic real environments and this life-long exposure learning process builds perceptual and cognitive expectations about how places should acoustically feel. There are a few sound characteristics that may contribute to this perceptual expectations of the immersive acoustic experiences.

Sound spatialization and auralization play a decisive role on the immersive experience. Sound can convey spatial depth and volume and increases the sense of realism and of being able to interact with the virtual environment [35, 42]. The spatialization of sound, in concrete, adds fidelity to the to the location of the sound sources and, thus, to the experiencing of space [9, 70].

Sound reverberation also contributes to a more realistic acousti-cal experience and appropriation of the virtual space. As known in psychoacoustics, reverberation and early reflections are responsible for the sensation of auditory "spaciousness" and one of the main cues accountable for the externalization of acoustic images [8, 9]. Background sounds made by objects in the environment and by the user interacting with the objects play, likewise, an important role in the sense of being part of that environment. This acoustic information is perceived on a primitive cognitive level, without a conscious awareness of it, nevertheless crucial for experiencing the acoustical space [52].

Although the factors described before such as reverberation, background sounds or spatialized sound contribute to the sensation of acoustic immersion, they may not create an immersive experi-ence per se. Perceptual cues in quantity may not assure a deeper immersive experience. Though sound has a substantial role on cre-ating the sense of immersion in virtual environments, enhancing spatial qualities of auditory cues may not always be beneficial [27]. Auditory cues and spatial qualities should be used judiciously and

meet the acoustical perceptual expectations of the specific space built by one’s life experience [30]. In other words, perceptual con-sistency may be assured. Moreover, in an audio-visual context, consistency across perceptual modalities is a fundamental feature, where a visual space corresponds to both spatial sound properties and sound source characteristics [17, 50].

When sensory inputs across modalities are allowed to cooperate in an integrated manner, instead of linearly summed in a simple coexistence, auditory detection, visual choice reaction time, as well as sound localization may improve and thus, yield richer immer-sive experiences. Otherwise, if audio-visual sensory information is mismatched then there is an ambiguity of interpretation that leads to an attentional focus on detecting what is incongruent. [43, 44]. Emotional involvement is also central in the immersive expe-rience, in particular, driven by auditory stimuli in a virtual envi-ronment. This potentiality of sound stimuli has been particularly explored in the context of gaming. The paradigmatic case is the induction of fear and discomfort through the manipulation of au-dio parameters, intensifying the emotional state of the gamer and, consequently, the sense of immersion [19, 21, 28, 29]. Examples of these features that affect emotional valence and intensity are pressure level, loudness, sharpness [16], increasing intensity [7], attack-decay- sustain-release [46], signal to noise ratio [25], period-icity, tempo and rhythm [3] and low-pitched sounds such as growls and rumbles [51].

Other important auditory non-content features that induce emo-tional involvement are the ones that provide architectural and ma-terial cues, such as reverberation and delays [32]. Through different reverberation times, emotions’ reactions may be influenced in an auditory virtual environment. In particular, higher reverberation times were perceived as most unpleasant [68]. Thus, the perceived auditory space is due to modulate emotional responses. The au-ditory experience of a space influences our perception, cognition and emotion. In their study, Tajadura-Jiménez et al. [65] found that small rooms were considered more pleasant calmer and safer than big rooms. Notwithstanding, this sensation would change regarding the content of the sound source, i. e., if listening to threatening sound sources.

Spatialization of sound can also have an impact on emotional reaction. In their study, Västfjäll [67] established a relationship between the number of audio channels and emotional reactions, where stereo and six-channel reproduction resulted in stronger changes in emotional reactions than the mono setup.

Sound signal characteristics of acoustic stimuli and their result-ing sensations indeed play a central role in respect to the elicitresult-ing of emotional reactions. However, these characteristics may not be sufficient to fully capture auditory-induced emotions and, thus, the listener and the contextual circumstances should not be neglected. Emotional responses may be learned and conditioned to one’s own personal experience and thus, the meaning attached to a sound stimulus is not disassociated from that experience. Furthermore, the expectations built resulting from this experience represent an important psychological construct that should be taken into account [17].

In their study, Asutay et al. [6] tested the effect of the meaning the listener attaches to a sound in the resulting sensation. They observed that auditory emotions may be idiosyncratic because they

(4)

depend on the meaning an individual listener assigns to sounds associated with a particular circumstance. Therefore, sound design should consider the different physical, psychoacoustical and psy-chological dimensions of auditory displays for producing effective emotional reactions in its perceiver.

2.2

Immersion and attention

The state of immersion requires the allocation of the perceptual resources that leads to a shift of attention and focus on the alter-native reality. This focus may involve disregarding the real world around, loosing awareness of external events and temporal dissoci-ation. This mechanism could also be formulated using the concept of selective attention in which one relevant source of information is attended to the detriment of others. Hence, could the state of immersion be a particular case of the more general phenomenon of selective attention?

Jennett [40] argues that immersion cannot be merely explained through selective attention since other mechanisms are involved such as motivation, for example. When a gamer is immersed in a virtual reality, the degree of detachment to the real world is related to the gamer’s motivation to remain in the immersive experience. This relation of attention, immersion and motivation relates to the notion of immersion as a graded experience, as a person’s degree of involvement. Brown and Cairns [12] distinguished different lev-els of immersion. These correspond to a certain level of emotional involvement and engagement within the experience, depending on the cognitive state of the user. This also means different levels of attention.

Selective attention can be seen then as a fundamental factor in the immersive experience. Attention can be used as a measure of the degree of engagement in an immersive experience [20]. Fur-thermore, the allocation of attentional resources may be controlled by emotional or motivational based factors [33]. In the next section, we deepen the concept of attention.

3

ATTENTION

When considering search bias, particularly inducing storytelling, attention is a central concept and thus it is essential to under-stand its operating principles and organization into corresponding subsystems. Attention refers to the preparedness for selection of in-formation, or subsets of inin-formation, to ensure its priority for focal and conscious processing. This encloses both concepts of alertness and indexing to resource allocation.

The understanding of attention is a central subject in cogni-tive psychology and has early been seen as comprised by multiple components, independent but interrelated attentional subsystems, rather than a monolithic processing block [38]. The formulation of attentional subsystems and its typologies has later been clarified [55], revised and refined over time [53, 56]. Notwithstanding, the core notion of three major functions remains, providing a concep-tual framework for studies related with attention. The main three subsystems are: (1) orienting to sensory events; (2) executive, se-lecting signals for focal processing; (3) the alerting maintenance of a vigilante state.

The orienting system is considered the process of moving atten-tion to a certain locaatten-tion, through the ability to prioritize sensory

input. Orienting implies that signals at a given location become amplified. This amplification elicits the detection of and orienting in the direction of a probable salient event. This mechanism of the cognitive system improves the acuity of processing since, by attending to a specific location, the target stimulus will be giving priority for a more efficient processing [56].

Orienting to a specific spatial location, even if attention was captured by stimuli from a specific modality, involves synergies between different sensory modalities [23] and not just the one aroused, since the location of salient events relies on the integration of information from multiple senses. Therefore, when a sudden stimulus attracts attention in one sensory modality, attention in other sensory modalities will be shifted to that same spatial location. Moreover, orienting may be driven by endogenous or exogenous motivations. Endogenous orienting is an active process, where at-tention is directed voluntary to a specific location in a top-down manner. Exogenous orienting consists of a passive process, where attention is captured to a specific location by reflexive mechanisms attracted by salient events, in a bottom-up manner [38]. Both en-dogenous and exogenous orienting paradigms show cross-modal integration links, although in different ways. This subject will be latter addressed.

Executive attention carries out tasks related with supervisory, monitoring, conflict resolution, handling novelty and focus atten-tion. This attentional subsystem is essential for producing top-down regulation and hence its link to executive control [54]. Although it is still not clear the nature, shape and exact link between executive attention and feelings regulation, this subsystem is considered to be correlated with developmental concepts such as: (a) self-regulation, the capacity to voluntary control one’s own emotions, thoughts and behaviour; (b) emotional regulation, that can be itself a form of self-regulation but can also be driven by external factors (i.e. others’ actions) through the reduction, increasing or maintenance of an emotional response (i.e. fear, pleasure, etc); and (c) effortful control, related with temperament, which consists in the capacity to activate, sustain and inhibit a response, regulating behaviour on command [58]. For this reason, the relationship between attention and emotion will be further developed.

Alerting subsystem is responsible for functions that produce and maintain optimal vigilance for high priority signals [53, 58]. The efficiency of alerting is considered in terms of when a target event will occur, considering temporal information and not where it will occur, related with location information (orienting). Performance in this form of attention is usually measured in reaction-time, where the speed response of orienting to a salient event is considered.

Each of the three attentional subsystems previously described can be regarded independent and is dedicated to specific processes, however they interact among them, cooperate and function together as a whole. Research has been developed in an effort to clarify the interactions between the three attentional main subsystems, since its understanding would lead to a complete account of attention [13, 14].

3.1

Spatial Attention

Our experience of the world is multimodal. It is rare the occasion where our perceiving of the world involves exclusively hearing,

(5)

or only seeing. Perceiving the world implies constant production of inferences and decisions based on information coming from multiple sensory modalities, simultaneously [34].

Our perception of space is not an exception: it is a supra-modal construct that is not limited to a specific sensory system, as one modality alone cannot provide a stable representation of space [4]. Indeed, the parietal cortex area of the brain combines and coordinates information coming across several sensory modalities to form a unified representation of space [18].

Spatial attention, consequently, more than multimodal, oper-ates through a cross-modal fashion, involving the articulation and integration of information from multiple sensory. Multimodal neu-rons may be in the basis of this cross-modal functioning. It has been observed that cells from certain neural areas respond to and code multiple sensory modalities in a special register and that these responses can be enhanced by multimodal stimulation at the same location [4]. This way, sensory systems influence each other, creating a cross-modal facilitatory effect in attention and thus in perception and action [15, 37, 49, 61].

Historically, the understanding of attention has been formerly focused within the effects of audition in attention. Later on, the em-phasis has shifted concerning the attentional effects within vision. However, as pointed out before, in most everyday circumstances, it is never the case when only one sensory modality is stimulated, rather these environments stimulate several sensory modalities and therefore, attention involves the coordination of all input stimuli in a cross-modal manner [22]. For this reason, and considering the audiovisual scope of this review, we only considered studies that take a cross-modal approach to attention, and not just on sensory modality at a time.

When attention is captured by a salient event towards a par-ticular place in a single sensory modality, it is likely that it will yield slight changes of attention in other modalities. For example, unexpected sounds (the exogenous case) do not only attract audi-tory attention but also call for visual attention to their location. Considering the hypothesis that space, in the auditory domain, is represented by multimodal maps, for performing auditory orient-ing, within these representations, a corresponding visual orienting will be needed as well [63]. Moreover, the endogenous parallel is also observed: voluntary attending to a location in the auditory sen-sory modality, in the situation when a sound is expected, auditory localization inherently improves, but visual judgments improve too [63].

These facts are of relevant interest when considering audiovi-sual immersive environments, and investigating the possibilities of sound as a persuading element. For this reason, audiovisual links in attention will be further developed in the next section.

3.2

Audiovisual links in spatial attention

In this section, we will review cross-modal links between audi-tory and visual sensory domains. The main scope of this literature review is to theoretically support the hypothesis of sound as an im-pelling element considering search tasks in audiovisual immersive environments. Accordingly, our main focus will be in gathering factors in which auditory stimuli induces or influences in some way visual sensory attention, and not the opposite.

Additionally, in the given context, we also have more interest in the cases when stimulus-driven (exogenous case) tasks are involved. This case fits the scenario where sound is the salient event that drives attention and directs vision to a certain location, enabling search tasks.

It is possible for the human being to process visual and audi-tory information separately when presented independently. Despite this, there is a body of research that suggests that there are strong links between auditory and visual attentional systems. This link between systems implies substantial limitations in processing in-formation presented in both sensory modalities in an independent way, pointing to an audiovisual integration hypothesis of attention [64].

Sound can be more effective in attracting spatial attention to locations than visual signals. Humans depend on audition to direct visual attention, especially when signals are present outside the visual field. Previous research has suggested that auditory warn-ing signals lead to the cross-modal exogenous orientwarn-ing of visual attention. The opposite case, of visual stimuli capturing auditory orienting attention is not likely to occur. The cross-modal depen-dence such that auditory signals capture visual orienting attention and not the vice-versa might be explained with the fact that real world auditory events tend to be transient, in contrast with visual events that are more likely to be more continuous and thus relevant events are more likely to become inaudible but remain visible and not the opposite case [63].

Humans tend to focus more attentively in auditory and visual stimuli when these are presented from a common distal location [64]. However, space concurrency is not a requirement for sounds to increase temporal visual accuracy. In an experiment where auditory cueing was used to enhance sensitivity to visual blinks, researchers suggest that temporal synchronism, with no spatial concurrency needed, sharpens the visual acuity [49]. Their hypothesis is that auditory stimuli might enhance the saliency of the visual stimulus and sensory representation of the visual blink.

Another study suggests that auditory stimuli (again warning signals such as "pips" or "bleeps") is sufficient to increase visual search speed when temporal synchronisation condition is assured [66]. The non-spatial auditory event guides visual attention toward the location of the visual object given the condition of synchrony guaranteed. The fact that there is no need of the auditory warning signal to have information of the location to increase visual search accuracy and make visual targets stand out is highlighted again and that the attentional guidance through synchronized auditory-visual events mechanisms is considered automatic. The authors discuss the reason for this auditory-induced visual sensory enhancement. Their hypothesis does not relate this effect with alerting but rather with an integration of the auditory signals with the visual ones, generating salient emergent feature that automatically attracts attention, reinforcing the idea of a joint integration of audiovisual information that leads to an amplification in phenomenal visual saliency.

Although the synchronism factor is relevant in the context of this review, the type of stimuli used in most of the experiments such as warning auditory signals (bleeps and pops) is not the most suitable when considering inducing saliency in visual events in a subliminal manner. Artificial tones are not considered ecological

(6)

stimuli and thus, are not representative of the diversity and richness of the sensory information encountered in the real world.

Another experiment proposes that redundancy in sensory audi-tory and visual information and its resulting integration may lead to the improvement of the signal strength and reliability, given that noise (environmental and sensory) is frequently uncorrelated across the two modalities [37]. However, in this study, researchers propose that the redundancy or coherency factor may be the characteris-tic sounds that visual objects produce. In their experiments, they found that subjects would improve object localization and identifi-cation when presented with the correspondent sound (e.g. hearing "meow" would facilitate localizing the picture of a cat). Once again, auditory processing improved visual search, confirming the spatial audiovisual sensory interactions.

3.3

The effect of emotion on attention

Emotion has a central role on attention. The emotional state of the individual can have a great impact on perceptual integrative and associative information processing in cognitive functions. The resources used for processing information may be influenced ac-cording to the emotional load within that information. Specifi-cally, emotional charged information may require more cognitive resources and be prioritized for processing than neutral or non-emotional information. Emotion is thus a high salient stimulus that produces a modulation effect on attention [72].

As mentioned before, emotion has a strong link with the exec-utive subsystem of attention that deals with signals selection for focal processing through top-down regulation. In our everyday life, we are constantly surrounded by an overload of information, whether from endogenous or exogenous origin. However, our at-tentional processing system capacity is limited and thus deals with the overload by restraining the representations of the information through competition by means of mechanisms of selective attention. This competition may be regulated and biased by emotion which performs an affective evaluation of stimuli, enhancing perceptual distinctiveness either by internal top-down or external bottom-up induced factors [57].

The effect of emotion is also observed on orienting. Orienting reflex responses in attention are thought to be mediated by the activation of neural circuits called defensive and appetitive motiva-tional systems. The eliciting of this circuits explains the intrinsic connection between attention and emotion and why emotion is fundamental to act and react effectively to events that threaten or support life [11]. Furthermore, a distinction is made between novel and significant events. Novel events are related with habituation phenomena and novelty contrast and are commonly used in studies of orienting and attention. On the other hand, significant events presuppose meaningful and valuable information in sensory stim-uli, whether it involves a pleasant or unpleasant (positive "satisfier" or a negative "annoyer") reinforcement and thus reflect orienting engaged in emotional task-relevant contexts. In conclusion, orient-ing is considered a response to an event’s emotional significance that engages active and passive attention.

Sound can be an emotional driving force, particularly consid-ering musical stimuli. In an experiment using positron emission tomography, researchers observed that musical stimuli recruited

brain circuits known to be active in response to reward, motiva-tion, emomotiva-tion, and arousal processes [10]. These results denote an emergent property of the complexity of human cognition and a biological and survival relevance of musical stimuli. The emotional responses to musical stimuli and its ability to stimulate endogenous reward systems provide, additionally, a relevant link with attention. A stimulus that is highly emotionally relevant will naturally engage attentional resources.

Emotion has also been observed as a relevant factor in sensory stimuli for audiovisual binding, such as temporal synchrony or cross-modal correspondence [41]. Accordingly, audiovisual asso-ciation and integration processes depend on the internal states of the observers and not only on the physical features of the sensory stimuli. Using musical stimuli to regulate emotional states in sub-jects, researchers observed that emotional factors were responsible for modulating and facilitating perceptual integrative processes in audiovisual materials. Audiovisual binding and focal attentional capturing make musical stimuli a powerful means to induce emo-tion and thus to engage attenemo-tional processes and lead to focal attentional processing.

4

DISCUSSION

In this article, we have explored pathways in which sound can be an element that contributes to the immersive experience in an audio-visual environment but also the element that modulates attention in a way that allows the biasing of the flow of a narrative. In other words, we have depicted possibilities in which sound contributes to the creation of an environment where the user experiences immer-sion, where the user has the sensation of agency, where her actions are not mediated by evident controls but subliminally induced in a way that it is ensured that a narrative is being followed.

Immersion, as it has been regarded along this article, is a cogni-tive state, a subjeccogni-tive experience of being mentally absorbed that involves the sensation of engagement within a certain narrative or experience [48]. Immersion is a modal category, rather than a medial category [2], meaning that it is not the characteristics of the new media, in digital environments, that convey the immersive experience.

In traditional expressive forms, i.e., books, movies, theatre, nar-rative storytelling strategies have been established. In particular, sound design, in the movies’ context, has been the element that drives the emotional state of the audience and thus, the emotional involvement with the narrative, leading to an immersive experience [2]. In virtual environments, the common approach has been to surround the user with stimuli (sound, video, smell) in an effort to imitate reality and considering that the sensory experience is a fun-damental feature to optimize the virtual immersive experience [17]. This demand to replicate reality brought an overload of informa-tion and with it new challenges in storytelling strategies. The task of allocating, sustaining and modulating the user’s attention has become more complex with the intensification of digital immersive environments.

However, although new digital media have a great potential for immersion by its interactive features, sensory realism achieved through accurate perspective projection may not always be the best strategy for conveying an immersive experience and take users

(7)

throughout a narrative [17]. The emphasis should be in creating an illusion, meeting our natural propensity to perceptually reinforce and complete, rather than to question. Create the illusion of non-mediation, of "being there", where the user cannot perceive the existence of a medium in the environment. Consequently, this is where sound can be a key element.

Immersion is deeply interconnected with attention. As stated before, attention can be formulated by the preparedness for selec-tion of informaselec-tion to ensure its priority for focal and conscious processing. The state of immersion requires the selection of in-formation, or focus of attention on the alternative reality. In this sense, it is possible to sustain and modulate attention with the aim of manipulating immersion and the degrees of engagement in the immersive experience but, at the same time, to bias and induce a narrative within that experience through orienting attention to a certain location.

We have reviewed literature concerning immersion and atten-tion, highlighting the role that sound might have in these cognitive phenomena. Throughout the review, we have found that perception is multimodal. We rarely perceive the world using only one sensory. Furthermore, more than perceiving with multiple sensory systems, the different information received across modalities is integrated and sensory systems influence each other, creating a cross-modal fa-cilitatory effect. For this reason, our approach is not based on sound solely but rather in audiovisual, specifically, focusing in the possi-bilities where sound can be the persuading element that captures and guides spatial or audiovisual attention.

Indeed, sound may have the role of facilitating the immersive experience, orient an user through a narrative with the sensation of agency and, at the same time, dissolving the acknowledgement of the existence of a medium in the virtual environment through emotional engagement and the decreasing of critical distance. We find that this can only be done by creating a propitious environ-ment for a motivated user to emotionally engage and, thus, focus attention. However, a few requirements must be met.

Consistency is a crucial factor. Coherency across sensory modali-ties and emotional matching must meet expectations that are shaped by a life-long learning process. Consistency may be faced in dif-ferent perspectives and, at this stage, as a result of the presented literature review, we enumerate the possibilities that we have iden-tified:

Consistency as a salient event that captures and drives attention: (a) Matching visual and auditory stimuli in terms of synchronism may be a salient event that captures attention. Auditory and visual information redundancy may result in the strengthening of the sen-sory signal in a disruptive, uncorrelated sensen-sory environment; (b) Matching visual and auditory prominence, through sound saliency in terms of loudness.

Consistency as the element that facilitates an emotional engage-ment. Emotional state and its induction and modulation through sound elements may be a key element to manipulate perception and create the illusion of a virtual immersive environment, maintaining motivation and attentional focus: (a) Matching visual and auditory acoustic information, in terms of visual space and acoustic spa-tialization (such as reverberation, delay and sound spaspa-tialization) can lead to an acoustic experience that drives emotional states; (b) Matching the emotional state considering contextual and user

specific circumstances, accounting for sound signal acoustic char-acteristics (such as pressure level, loudness, sharpness, increasing intensity, attack-decay- sustain- release, signal to noise ratio, pe-riodicity, tempo and rhythm, low-pitched sounds such as growls and rumbles) yet also the content or meaning of the sound source (background sounds, for example), thus, considering the physical, psychoacoustical and also the psychological dimensions of auditory displays in an audiovisual context. Aiming for significant events (in opposition to novel events) that presuppose meaningful and valuable sensory information will lead to an increasing of emotional significance that will engage orienting attentional responses; (c) Musical stimuli, specifically, may be remarkably effective in driving emotional states, engaging motivation and attentional resources.

5

ACKNOWLEDGMENTS

The work described in this article has been developed within the framework of the research and innovation project CHIC. This project is co-funded by the European Union through the Euro-pean Fund for the Regional Development within the context of the programme COMPETE 2020 (Programa Operacional da Competi-tividade e Internacionalização).

REFERENCES

[1] E. Adams. 2014. Fundamentals of game design, third edition. New Riders, Pearson Education.

[2] Giacomo Albert. 2012. Immersion as category of audiovisual experience: From Long Beach to Hollywood. Worlds of Audiovision (2012).

[3] V Alves and L Roque. 2009. A proposal of soundscape design guidelines for user experience enrichment. Audio Mostly 2009 (2009), 27–32.

[4] Richard A Andersen, Lawrence H Snyder, David C Bradley, and Jing Xing. 1997. Multimodal representation of space in the posterior parietal cortex and its use in planning movements. Annual Review of Neuroscience 20, 1 (1997), 303–330. [5] D. Arsenault. 2005. Dark waters: Spotlight on immersion.. In Game-On North

America 2005 Conference: Eurosis. Elsevier, Ghent, Belgium, 50–52.

[6] Erkin Asutay, Daniel Västfjäll, Ana Tajadura-Jimenez, Anders Genell, Penny Bergman, and Mendel Kleiner. 2012. Emoacoustics: A study of the psychoacous-tical and psychological dimensions of emotional sound design. Journal of the Audio Engineering Society60, 1/2 (2012), 21–28.

[7] Dominik R Bach, John G Neuhoff, Walter Perrig, and Erich Seifritz. 2009. Looming sounds as warning signals: The function of motion cues. International Journal of Psychophysiology74, 1 (2009), 28–33.

[8] Durand R. Begault. 1992. Perceptual Effects of Synthetic Reverberation on Three-Dimensional Audio Systems. J. Audio Eng. Soc 40, 11 (1992), 895–904. [9] Durand R Begault and Leonard J Trejo. 2000. 3-D sound for virtual reality and

multimedia. (2000).

[10] Anne J Blood and Robert J Zatorre. 2001. Intensely pleasurable responses to music correlate with activity in brain regions implicated in reward and emotion. Proceedings of the National Academy of Sciences98, 20 (2001), 11818–11823. [11] Margaret M. Bradley. 2008. Natural selective attention: Orienting and emotion.

Psychophysiology46 (2008), 1–11.

[12] Emily Brown and Paul Cairns. 2004. A grounded investigation of game immersion. In CHI’04 extended abstracts on Human factors in computing systems. ACM, 1297– 1300.

[13] Alicia Callejas, Juan Lupianez, María Jesús Funes, and Pío Tudela. 2005. Modula-tions among the alerting, orienting and executive control networks. Experimental Brain Research167, 1 (2005), 27–37.

[14] Alicia Callejas, Juan Lupiáñez, and Pıo Tudela. 2004. The three attentional networks: On their independence and interactions. Brain and cognition 54, 3 (2004), 225–227.

[15] Lihan Chen and Jean Vroomen. 2013. Intersensory binding across space and time: a tutorial review. Attention, Perception, & Psychophysics 75, 5 (2013), 790–811. [16] Jayoung Cho, Eunjou Yi, and Gilsoo Cho. 2001. Physiological responses evoked by

fabric sounds and related mechanical and acoustical properties. Textile Research Journal71, 12 (2001), 1068–1073.

[17] Priscilla Chueng and Phil Marsden. 2003. Designing auditory spaces: the role of expectation. In Proceedings of 10th International Conference on Human Computer Interaction. Citeseer, 616–620.

[18] Carol L Colby and Michael E Goldberg. 1999. Space and attention in parietal cortex. Annual Review of Neuroscience 22, 1 (1999), 319–349.

(8)

[19] Karen Collins. 2011. Making gamers cry: mirror neurons and embodied inter-action with game sound. In Proceedings of the 6th audio mostly conference: a conference on interaction with sound. ACM, 39–46.

[20] Rudolph P Darken, David Bernatovich, John P Lawson, and Barry Peterson. 1999. Quantitative measures of presence in virtual environments: the roles of attention and spatial comprehension. CyberPsychology & Behavior 2, 4 (1999), 337–347. [21] Thainá Cristina Demarque and ES Lima. 2013. Auditory hallucination:

Audiolog-ical perspective for horror games. SBC–Proceedings of SBGames 2013 (2013). [22] Jon Driver and Charles Spence. 1998. Cross-modal links in spatial attention.

Philosophical Transactions of the Royal Society of London B: Biological Sciences353, 1373 (1998), 1319–1331.

[23] Jon Driver and Charles Spence. 1998. Crossmodal attention. Current opinion in neurobiology8, 2 (1998), 245–253.

[24] Marian T Dura. 2006. The phenomenology of the music-listening experience. Arts Education Policy Review107, 3 (2006), 25–32.

[25] Inger Ekman. 2008. Psychologically motivated techniques for emotional sound in computer games. In Proc. AudioMostly 2008, 3rd Conference on Interaction with Sound, Piteå, Sweden. 20–26.

[26] Laura Ermi and Frans Mäyrä. 2005. Fundamental components of the gameplay experience: Analysing immersion. Worlds in play: International perspectives on digital games research37, 2 (2005), 37–53.

[27] Jonathan Freeman and Jane Lessiter. 2001. Here, there and everywhere: the effects of multichannel audio on presence. In Proceedings of The Seventh International Conference on Audio Display, Helsinki, Finland. GIT, 231–234.

[28] Tom Garner, Mark Grimshaw, and Debbie Abdel Nabi. 2010. A preliminary experiment to assess the fear value of preselected sound parameters in a survival horror game. In Proceedings of the 5th Audio Mostly Conference: A Conference on Interaction with Sound. ACM, 10.

[29] Tom A Garner and Mark Grimshaw. 2013. Psychophysiological assessment of fear experience in response to sound during computer video gameplay. In IADIS HCCI Conference.

[30] William W Gaver. 1993. How do we hear in the world? Explorations in ecological acoustics. Ecological psychology 5, 4 (1993), 285–313.

[31] Oliver Grau. 2003. Virtual Art: from illusion to immersion. MIT press. [32] Mark Grimshaw. 2007. The Resonating spaces of first-person shooter games. In

Proceedings of the 5th International Conference on Game Design and Technology. [33] Mark Grimshaw, John Charlton, and Richard Jagger. 2011. First-person shooters:

Immersion and attention. Eludamos. Journal for Computer Game Culture 5, 1 (2011), 29–44.

[34] Stephen Handel. 2006. Perceptual coherence: Hearing and seeing. Oxford University Press.

[35] Claudia Hendrix and Woodrow Barfield. 1996. Presence within virtual environ-ments as a function of visual display parameters. Presence: Teleoperators & Virtual Environments5, 3 (1996), 274–289.

[36] Don Ihde. 2007. Listening and voice: Phenomenologies of sound, Second Edition. Suny Press.

[37] Lucica Iordanescu, Emmanuel Guzman-Martinez, Marcia Grabowecky, and Satoru Suzuki. 2008. Characteristic sounds facilitate visual search. Psychonomic Bulletin & Review15, 3 (2008), 548–554.

[38] W. James. 1890. The Principles of Psychology. Vol. Volume 1. New York: Henry Holt.

[39] Charlene Jennett, Anna L Cox, Paul Cairns, Samira Dhoparee, Andrew Epps, Tim Tijs, and Alison Walton. 2008. Measuring and defining the experience of immersion in games. International Journal of Human-Computer Studies 66, 9 (2008), 641–661.

[40] Charlene Ianthe Jennett. 2010. Is game immersion just another form of selective attention? An empirical investigation of real world dissociation in computer game immersion. Ph.D. Dissertation. UCL (University College London).

[41] Miho S Kitamura, Katsumi Watanabe, and Norimichi Kitagawa. 2016. Positive emotion facilitates audiovisual binding. Frontiers in Integrative Neuroscience 9 (2016), 66.

[42] Pontus Larsson, Daniel Västfjäll, and Mendel Kleiner. 2004. Perception of self-motion and presence in auditory virtual environments. In Proceedings of seventh annual workshop presence. 252–258.

[43] Pontus Larsson, Daniel Västfjäll, Pierre Olsson, and Mendel Kleiner. 2007. When what you hear is what you see: Presence and auditory-visual integration in virtual environments. In Proceedings of the 10th annual international workshop on presence. 11–18.

[44] Scott D Lipscomb. 1999. Cross-modal integration: Synchronization of auditory and visual components in simple and complex media. The Journal of the Acoustical Society of America105, 2 (1999), 1274.

[45] Alison McMahan. 2003. Immersion, engagement and presence. The video game theory reader67 (2003), 86.

[46] Simon Moncrieff, Chitra Dorai, and Svetha Venkatesh. 2001. Affect comput-ing in film through sound energy dynamics. In Proceedcomput-ings of the ninth ACM international conference on Multimedia. ACM, 525–527.

[47] Janet Horowitz Murray. 2017. Hamlet on the holodeck: The future of narrative in cyberspace. MIT press.

[48] Niels Christian Nilsson, Rolf Nordahl, and Stefania Serafin. 2016. Immersion revisited: A review of existing definitions of immersion and their relation to different theories of presence. Human Technology 12 (2016).

[49] Toemme Noesselt, Daniel Bergmann, Maria Hake, Hans-Jochen Heinze, and Robert Fendrich. 2008. Sound increases the saliency of visual events. Brain research1220 (2008), 157–163.

[50] Kenji Ozawa, Satoshi Ohtake, Yôiti Suzuki, and Toshio Sone. 2003. Effects of visual information on auditory presence. Acoustical Science and Technology 24, 2 (2003), 97–99.

[51] Jim R Parker and John Heerema. 2008. Audio interaction in computer mediated games. International Journal of Computer Games Technology 2008 (2008), 1. [52] Renato S Pellegrini. 2001. Quality assessment of auditory virtual environments. In

Hiipakka, J., Zacharov, N., Takala, T. (eds.):Proceedings of the 7th International Con-ference on Auditory Display. Laboratory of Acoustics and Audio Signal Processing and the Telecommunications Software and Multimedia Laboratory. 161–168. [53] Steven E Petersen and Michael I Posner. 2012. The attention system of the human

brain: 20 years after. Annual review of neuroscience 35 (2012), 73–89. [54] Michael I Posner (Ed.). 2012. Cognitive Neuroscience of Attention (second edition

ed.). Guilford Press, New York.

[55] Michael I Posner and Stephen J Boies. 1971. Components of attention. Psycholog-ical review78, 5 (1971), 391–408.

[56] Michael I Posner and Steven E Petersen. 1990. The attention system of the human brain. Annual review of neuroscience 13, 1 (1990), 25–42.

[57] Jane Raymond. 2009. Interactions of attention, emotion and motivation. Progress in brain research176 (2009), 293–308.

[58] Amir Raz and Jason Buhle. 2006. Typologies of Attentional Networks. Nature Reviews Neuroscience7, 5 (2006), 367–379.

[59] Brendan Rooney, Ciarán Benson, and Eilis Hennessy. 2012. The apparent reality of movies and emotional arousal: A study using physiological and self-report measures. Poetics 40, 5 (2012), 405–422.

[60] Marie-Laure Ryan. 2003. Narrative as virtual reality: Immersion and interactivity in literature and electronic media. Johns Hopkins University Press.

[61] Michael Schutz and Scott Lipscomb. 2007. Hearing gestures, seeing music: Vision influences perceived tone duration. Perception 36, 6 (2007), 888–897. [62] Mel Slater. 2009. Place illusion and plausibility can lead to realistic behaviour in

immersive virtual environments. Philosophical Transactions of the Royal Society of London: Biological Sciences364, 1535 (2009), 3549–3557.

[63] Charles Spence and Jon Driver. 1997. Audiovisual links in exogenous covert spatial orienting. Perception & psychophysics 59, 1 (1997), 1–22.

[64] C. Spence and Driver J. 1997. Engineering Psychology and Cognitive Ergonomics: Volume Two – Job Design and Product Design. Routledge, Chapter Audiovisual Links in Attention: Implications for Interface Design.

[65] Ana Tajadura-Jiménez, Pontus Larsson, Aleksander Väljamäe, Daniel Västfjäll, and Mendel Kleiner. 2010. When room size matters: acoustic influences on emotional responses to sounds. Emotion 10, 3 (2010), 416.

[66] Erik Van der Burg, Christian NL Olivers, Adelbert W Bronkhorst, and Jan Theeuwes. 2008. Pip and pop: nonspatial auditory signals improve spatial visual search. Journal of Experimental Psychology: Human Perception and Performance 34, 5 (2008), 1053.

[67] Daniel Västfjäll. 2003. The subjective sense of presence, emotion recognition, and experienced emotions in auditory virtual environments. CyberPsychology & Behavior6, 2 (2003), 181–188.

[68] Daniel Västfjäll, Pontus Larsson, and Mendel Kleiner. 2002. Emotion and auditory virtual environments: affect-based judgments of music reproduced with virtual reverberation times. CyberPsychology & Behavior 5, 1 (2002), 19–32. [69] Valentijn T Visch, Ed S Tan, and Dylan Molenaar. 2010. The emotional and

cognitive effect of immersion in film viewing. Cognition and Emotion 24, 8 (2010), 1439–1445.

[70] Elizabeth Wenzel, Frederic Wightman, Doris Kistler, and Scott Foster. 1988. Acous-tic origins of individual differences in sound localization behavior. The Journal of the Acoustical Society of America84, S1 (1988), S79–S79.

[71] Bob G Witmer and Michael J Singer. 1998. Measuring presence in virtual envi-ronments: A presence questionnaire. Presence 7, 3 (1998), 225–240.

[72] Jenny Yiend. 2010. The effects of emotion on attention: A review of attentional processing of emotional information. Cognition and Emotion 24, 1 (2010), 3–47.

Referências

Documentos relacionados

In this study their methods has been used to create the simulation of dynamic evacuation signs in an Immersive Virtual Environment by using Immersive

Segundo Fernández (1990) o fracasso escolar responde a duas ordens de causas que se encontram imbicadas na história do sujeito próprios da estrutura familiar e individual daquele

Este artigo discute o filme Voar é com os pássaros (1971) do diretor norte-americano Robert Altman fazendo uma reflexão sobre as confluências entre as inovações da geração de

Apesar de ter sido fruto de um relacionamento entre um senhor de engenho e sua escrava, Escolástica e seus irmãos estavam distante do universo da maioria de seus semelhantes na

Para determinar o teor em água, a fonte emite neutrões, quer a partir da superfície do terreno (“transmissão indireta”), quer a partir do interior do mesmo

Uma das explicações para a não utilização dos recursos do Fundo foi devido ao processo de reconstrução dos países europeus, e devido ao grande fluxo de capitais no

Neste trabalho o objetivo central foi a ampliação e adequação do procedimento e programa computacional baseado no programa comercial MSC.PATRAN, para a geração automática de modelos

FIGURA 5 - FLUTUAÇÃO POPULACIONAL DE Hypothenemus eruditus, Sampsonius dampfi e Xyleborus affinis (SCOLYTINAE, CURCULIONIDAE) COLETADOS COM ARMADILHAS ETANÓLICAS (25%)