The Sound of Voice: Voice-Based Categorization of Speakers' Sexual Orientation within and across Languages.

(1)

The Sound of Voice: Voice-Based

Categorization of Speakers

_’

Sexual

Orientation within and across Languages

Simone Sulpizio1,2☯

*, Fabio Fasoli3☯_{, Anne Maass}3_{, Maria Paola Paladino}1_, Francesco Vespignani1, Friederike Eyssel4,5, Dominik Bentler4

1Department of Psychology and Cognitive Science, University of Trento, Trento, Italy,2Fondazione Marica De Vincenzi ONLUS, Rovereto (TN), Italy,3Department of Developmental Psychology and Socialization, University of Padua, Padua, Italy,4Center of Excellence–Cognitive Interaction Technology, University of

Bielefeld, Bielefeld, Germany,5Psychology Faculty, New York University Abu Dhabi, Abu Dhabi, United Arab Emirates

☯These authors contributed equally to this work. *[email protected]

Abstract

Empirical research had initially shown that English listeners are able to identify the speak-ers' sexual orientation based on voice cues alone. However, the accuracy of this voice-based categorization, as well as its generalizability to other languages (language-depen-dency) and to non-native speakers (language-specificity), has been questioned recently. Consequently, we address these open issues in 5 experiments: First, we tested whether Italian and German listeners are able to correctly identify sexual orientation of same-lan-guage male speakers. Then, participants of both nationalities listened to voice samples and rated the sexual orientation of both Italian and German male speakers. We found that listen-ers were unable to identify the speaklisten-ers' sexual orientation correctly. However, speaklisten-ers were consistently categorized as either heterosexual or gay on the basis of how they sounded. Moreover, a similar pattern of results emerged when listeners judged the sexual orientation of speakers of their own and of the foreign language. Overall, this research sug-gests that voice-based categorization of sexual orientation reflects the listeners' expecta-tions of how gay voices sound rather than being an accurate detector of the speakers' actual sexual identity. Results are discussed with regard to accuracy, acoustic features of voices, language dependency and language specificity.

Introduction

Sometimes when you want to believe so badly, you end up. . .looking too hard.

(X-Files,Season 2–Episode 5)

Overhearing a voice often leads individuals to spontaneously categorize the speaker as a member of a specific social group. A few seconds of listening are enough to form impressions OPEN ACCESS

Citation:Sulpizio S, Fasoli F, Maass A, Paladino MP, Vespignani F, Eyssel F, et al. (2015) The Sound of Voice: Voice-Based Categorization of Speakers’

Sexual Orientation within and across Languages. PLoS ONE 10(7): e0128882. doi:10.1371/journal. pone.0128882

Academic Editor:Charles R Larson, Northwestern University, UNITED STATES

Received:August 20, 2014

Accepted:May 1, 2015

Published:July 1, 2015

Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability Statement:Data are available at the following link:figshare.com/s/

5a608136ad1311e48f4a06ec4b8d1f61.

Funding:The research was supported by a grant of the Fondazione Cassa di Risparmio di Trento e Rovereto and by a grant of the German Research Foundation for COE 277.

(2)

about a person’s gender, ethnicity, age, and even about his/her personality [1]. In the case of sexual orientation (from now on SO), it has often been argued that people possess an ability to recognize a man’s SO on the basis of subtle indirect cues such as walking style or facial features [2]. Recent studies have suggested that this“detection skill”may function only on the basis of acoustic cues, meaning that people infer the SO of male speakers from voice alone [3,4,2]. In-terestingly, categorization of SO seems to be more accurate when based on vocal than facial fea-tures [5]. However, acoustic cuesper secan be misleading [6,7] since they are affected by both anatomical aspects (speaker's size, shape, and physical conditions; [8]) and social expectations (e.g., social group membership or gender role; [9]). The reliance on cultural rather than ana-tomical cues has been shown in other lines of research, such as when adults identify the sex of children based more on behavioral than on the basis of actual anatomical differences [10,11]. Moreover, the absence of any physiological correlates of SO makes its voice-based categoriza-tion even more problematic. Whereas male and female voices differ from each other as a conse-quence of physical (e.g., height) and biological features (e.g., testosterone; [12,13]), it is less clear on which physiological grounds gay and heterosexual male voices should be distinguish-able. Among others, Zimman [14] has recently remarked that rather than using actual cues, perceivers may draw inferences about SO from the degree to which an individual’s speech style deviates from typical heterosexual voices. Thus, rather than reflecting actual differences in speech style, such voice-based categorization seems to be driven by beliefs about what gay vs. heterosexual voices sound like.

The present research aims to address the issue of accuracy of the voice-based categorization of SO and to contribute to this line of work by investigating this process across two different languages not investigated in prior research.

Voice-based categorization of speakers

’

sexual orientation

Since the seminal study by Gaudio [3], the issue of voice-based categorization of SO has stirred much discussion and produced heterogeneous findings. Initial research provided evidence for the fact that listeners are accurate in judging the speakers' SO on the basis of their voice [3,5]. However, Smyth and colleagues [6] have suggested that listeners’judgments are often inaccu-rate. In fact, correct recognition seems generally driven by a small subset of voices that are con-sistently judged as gay- or heterosexual-sounding. In line with this observation, recent studies on this voice-based categorization process have moved attention towardperceivedspeakers’

SO, analyzing how listeners categorize a speaker as gay or heterosexual regardless of the accura-cy of these judgments [15,4,6]. Listeners may have expectations of how gay versus heterosexu-al mheterosexu-ales speak and, hence, categorize voices according to whether they do or do not match such expectations (on this issue, see [15]).

Studies on voice-based categorization of SO have mainly examined which acoustic cues (e.g., vowel duration) drive the listener’s categorization. However, only few studies have also in-vestigated whether there is any actual acoustic difference in the speech of gay- and heterosexu-al-speakers. On this issue, Pierrehumbert and colleagues [16] analyzed the speech of English speakers and reported differences in the first-and second-formant frequency of vowels /ɑ/, /i/, and /u/. Similarly, Munson and colleagues [4] reported that English self-identified gay- and heterosexual-speakers differed in the way they produced the first-formant frequency of /ae/ and /ε/, and the spectral skewness of /s/.

(3)

two. On the other hand, Munson and colleagues [4] investigated several acoustic cues related to the speech signal and reported that listeners' perception was driven by first-formant frequen-cy of front vowels and the spectral skewness of /s/ (for the similar results, see also [15]). Partic-ularly telling is the work of Munson and colleagues that allows one to directly compare the acoustic cues as a function of actual and perceived SO. The comparison shows that there is overlap between the two types of information as both self-identified gay- and heterosexual-speakers, and sounding gay- and heterosexual-speakers differed in terms of first-formant fre-quency of frontal vowels and spectral skewness of /s/.

To summarize, research has shown that there is a relation between the way speakers’speak (acoustic features) and how they are categorized by listeners. Although interesting and infor-mative with regard to the process of voice-based categorization, these studies have provided mixed results regarding the accuracy of this categorization process. Indeed, previous research has mainly focused on the acoustic cues linked to either actual or perceived SO in speech (ex-cept for [4]). This raises the question of which cues are actually used by listeners to make infer-ences and whether these cues are the same that objectively distinguish gay and

heterosexual voices.

Language-specificity and language dependency

The Achilles' heel of research on categorization of SO on the basis of voice is the restricted lin-guistic context in which studies have been conducted. Previous research has involved almost exclusively English speakers and listeners. As a consequence, it remains unclear whether the voice-based categorization of SO islanguage-dependent–that is, whether it occurs in English, but not in other languages. Moreover, to our knowledge, it is also not clear whether this process

islanguage-specific–that is, whether listeners recognize the SO only for individuals who speak

the same language or also for those speaking a foreign language. Both issues are crucial for our understanding of this voice-based categorization process as either a language-specific process or a universal phenomenon.

(4)

to detect SO in a language other than English, namely Czech. According to the authors, results suggested an overall good accuracy of judgments about targets’SO among a sample of hetero-sexual female and gay male listeners [5].

Turning to the question oflanguage-specificity, Schwieter [20] has recently stressed the need to investigate the ability of listeners to detect the SO of speakers of different languages in order to pinpoint how this voice-based categorization operates. To date, only one study has ad-dressed the issue of cross-cultural categorization of SO. In this study, Valentova and colleagues [21] have examined how American and Czech participants categorized SO of people of their own and of the other country. Participants watched short videotapes and indicated the targets' SO based on acoustic, visual, and gestural features. The authors found an above-chance relation between raters' judgment and self-defined targets' SO, although this relation was stronger when targets and raters shared the same nationality. The study provides first evidence of cross-cul-tural categorization of gay men, but the findings still remain inconclusive, given that raters si-multaneously made use of both acoustic and visual cues. On one side, it is possible that the availability of visual cues may have increased accuracy above and beyond the inferences drawn from acoustic cues alone. Indeed, several studies have shown that visual cues such as face and gesture are particularly informative about SO [22,23]. On the other side, it is possible that rat-ers might have derived their judgments mainly from vocal information. This argument is in line with a recent study showing that voice, but not face, was a meaningful cue in categorization of SO [5]. Thus, it remains currently unclear whether people categorize others as gay vs. hetero-sexual on the basis of voice alone when confronted with foreign language speakers. If we want to understand the generality of voice-based categorization of SO, both language-dependence and language-specificity need to be investigated systematically. The present study provides a first step in this direction.

Aims of the present research

Starting from the ongoing debate outlined above, the current study addresses four main ques-tions: First, we investigated whether voice-based categorization is, to some degree, accurate or whether it is purely driven by perceivers' expectancy. Thus, we tested whether heterosexual lis-teners are able to correctly identify male speakers' SO from voice alone or whether they base their judgments on what the speakers sound like, regardless of the actual SO of the speaker (in which case categorization would be expectancy-driven). To address this issue, we ran five ex-periments in which we asked participants to listen to male voices and to categorize the speak-ers' SO using an explicit or an indirect measure.

Second, to investigate the question oflanguage-dependency, we considered speakers and lis-teners of two different languages, namely German and Italian. This allowed us to also address our third aim, namelylanguage-specificitygiven that our Italian and German participants were asked, in Study 3, to judge the likely SO of both Italian and German speakers.

More importantly for our aims, Italian and German differ at multiple levels. A first differ-ence is that Italian pertains to the Romance languages, whereas German is a West Germanic language, just like English. Note also that both languages differ from Czech (the only language other than English that has been investigated). Czech belongs indeed to a specific West Slavic language group that is very different from other Indo-European languages. Moreover, and most critical to the aims of our research, German and Italian differ with respect to their phono-logical system and its phonetic realization. German has a larger number of vowels and conso-nants than Italian [24]; German is also a language with a higher degree of articulatory

(5)

vowel cues has been reported in previous research (e.g., [15]), the comparison between German and Italian may help understand to what extent the perception of speakers' SO is affected by specific language features. Thus, although the main goal of this work is not the investigation of acoustic cues of SO, a look at the relation between speakers' self-identification, listeners' judg-ment and acoustic information will help to shed light on the general issue of voice-based cate-gorization of speakers’sexual identity.

A fourth aim of the present work is of methodological nature and concerns the different measures used in prior research. Whereas in some studies, listeners were asked to make dichot-omous choices regarding the speakers' SO (i.e., heterosexual vs. gay, [6]), other research has used Likert scales that allow respondents to modulate their responses (e.g., [3,5,26]). These differences in measurement may in part explain the contradictory findings in the literature, al-though their influence is difficult to quantify given that studies also differ in other respects (e.g., the voice samples). We therefore varied the measures across studies while keeping other characteristics of the materials, the procedure and the voice samples constant. In Experiment 1A and 1B, participants were asked to make dichotomous choices, combined with a mouse tracking measure that provides an implicit measure of subjective uncertainty. In Experiment 2A and 2B, judgments were expressed on a continuous, Likert-type scale that allowed responses of degree.

To sum up, in the present research, we address four main limitations of previous work on voice-based categorization of SO: First, in all our studies we examine theaccuracyof this pro-cess to understand whether listeners make their judgments on the basis of actual differences be-tween gay versus heterosexual speech productions (reality-driven) or on the basis of presumed, but non-veridical differences (expectancy-driven). This question had received relatively little attention in prior research. Second, we address the issue oflanguage-dependencyby examining how the voice-based categorization process operates in languages other than English, namely Italian (Experiments 1A and 2A) and German (Experiments 1B and 2B). In doing so, we also examine whether the speakers’acoustic cues that have been related to listeners’judgments of SO in English samples, are also present in other languages. Third, for the first time, we address the issue oflanguage-specificity. In particular, we report a cross-linguistic experiment (Experi-ments 3) testing whether the pattern that heterosexual listeners show when categorizing voices of same-language speakers also holds when they categorize voices speaking a foreign language. Fourth, across studies, we employ two different measures to assess listeners’judgments, while holding the stimulus materials and procedure constant. This allows us to overcome a limit of previous research, namely the use of different methodologies that may have contributed to the contradictory findings in the existing literature.

We will first present four within-language studies in which Italian and German participants were asked to identify the SO of gay and heterosexual speakers (1A and 1B using dichotomous measures and 2A and 2B using continuous measures). Subsequently, we present comprehen-sive acoustic analyses of these studies. The last study reports identification of SO of foreign language speakers.

Experiment 1A

Dichotomous categorization via mouse tracking

–

Italian sample

(6)

emerge; we also tested the relation between the acoustic proprieties of the speakers’vocal signal and listeners’judgments of SO.

In Experiment 1A and 1B, we employed a dichotomous forced-choice method combined with a new, for this line of research, method: themouse tracker. Here the answers have to be provided within a limited time window by moving the mouse to the left or right upper corners representing in this case the labels“heterosexual”or“homosexual”. Please note that the Italian term“omosessuale”and the German term“homosexuell”are considered evaluatively neutral and inclusive of men and women in these two languages, and can hence be used in the same way for studies on gay and lesbian speakers.Along with the categorical responses, this measure allows one to record mouse trajectories indicative of the certainty or hesitation with which par-ticipants reach the final response. Hand movements are supposed to track the real time dynam-ic of the categorization process [27], with the advantage to observe not only the outcome of the categorization, but also the unfolding of the process itself.

Method

Ethics statement. The research presented in this paper was approved by the University of Trento ethics committee, and was conducted in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki. All participants provided consent

before participating.

Participants. Thirty students of a middle size university in the north of Italy took part in the experiment in the role of listeners. Two participants were excluded from the analyses: One identified herself as bisexual and the other reported not to be an Italian native speaker. The final sample consisted of 28heterosexual Italian participants (14 females,Mage= 20.33, SD= 0.99).

Speakers. All speakers were recruited through the researchers' contacts, through advertise-ments placed on University bulletin-boards and, in the case of gay speakers, also through LGBT associations. Previous research has used different methodologies in recruiting speakers. In some studies, speakers were aware of the research topic or were explicitly contacted because of their SO [2,3], in others no explicit reference to SO as the topic under study was made [4,6]. We decided to follow this second strategy, thus none of our speakers was informed a-priori about the aim of the research, nor was any reference made to SO. Participants were only told that the purpose of the study was to record materials for future studies. To avoid any suspi-cious, when we contacted speakers through LGBT associations, we told them we were recruit-ing non-student participants and the easiest way to obtain a representative sample of the population was to contact different cultural associations in town.

Speakers were recorded individually in a quiet room (sampling at 44 kHz, 16 bit resolution, mono). They were invited to read in a natural way 20 experimental sentences written in their native language, and their voice was recorded using PRAAT [28]. Then, they were asked to fill out a questionnaire including, among other scales, demographic information such as gender, age, and SO. This latter information was provided on a scale from 1(exclusively heterosexual) to 7(exclusively homosexual). Speakers who reported a value above the scale midpoint (i.e., 5 or above) were considered self-identified gays, those reporting a value below the midpoint (i.e., 3 or below) were considered self-identified heterosexuals. At the end, speakers were fully de-briefed and signed the consent form approving the use of the audio materials.

(7)

speakers' accents, all of them came from the North-East of Italy (provinces of Verona and Trento).

Sentences. Twenty recorded sentences were used as stimuli (see Appendix A). The sen-tences were constructed in order to have a similar syntactic structure and a neutral content with reference to SO. Stimuli had approximately the same length (5–9 words, mean length = 6.95,SD= 0.99).

Procedure. Participants were informed that they would have to categorize speakers as gay or heterosexual on the basis of their voice. To do that, following Freeman and colleagues' [29] procedure, we implemented a categorization task using mouse tracker. Participants completed two blocks in which they listened to 20 sentences produced by 20 speakers. Voices were pre-sented in a randomized order. Participants listened to one sentence for each speaker and, with-in each block, each speaker pronounced a different sentence. Half of the sentences were pronounced by self-identified gay speakers, half by self-identified heterosexual speakers. All sentences (n = 20) recorded for each speaker were used across participants to assure that type of sentence did not influence results.

Participants were tested individually. While listening, they had to categorize the speakers as gay or heterosexual by clicking on one of the two labels (homosexual vs. heterosexual) dis-played on the top left and right corners of the screen. Across participants, the label position was counterbalanced. Participants had to move the mouse toward the label with the category they wanted to select and click on it. The initial position of the cursor was at the bottom center of the screen and participants moved it toward one of the two categories in order to make their choices. Participants were instructed to be as fast as they could, with a limited time (3 seconds) to answer. Using MouseTracker software [27], we recorded response accuracy and mouse tra-jectories. The experimental session was preceded by a short practice. At the end of the task, participants were asked to report their demographic information (gender, age, SO, nationality, and native language) and were then thanked and debriefed.

Results

First, we tested whether participants were able to accurately recognize the speakers' SO. In this analysis, response accuracy was tested separately for self-identified gay and heterosexual speak-ers. Then, we compared participants' mouse trajectories to test how the categorization process unfolded. Finally, we looked separately at each speaker to see which speakers were judged con-sistently as heterosexual or gay.

Sexual orientation–Response accuracy. Inspection of the percentages of correct identifi-cations revealed a different pattern for self-identified heterosexual and self-identified gay speakers: The former were correctly categorized in 63% of all cases which differed reliably from chance (i.e., accuracy at 50%), chi-square = 18.71,p<.001, whereas the likelihood of correct

identification of gay speakers was below chance (39% correct identifications, chi-square = 14.68, p<.001). Thus, whereas self-identified heterosexuals were correctly categorized above chance,

self-identified gays were for the most part wrongly categorized as heterosexuals (seeS1 Table

for statistics for each speaker). As evident from the percentages, the better recognition of het-erosexual than homosexual speakers is almost entirely due to a criterion shift. This interpreta-tion is also confirmed by a signal detecinterpreta-tion analysis that allows us to discriminate between the accurate responses (hit) and the incorrect responses (false alarms). Using this approach we took in account the general tendency to categorize speakers as straight, theresponse bias, and the difficulty in distinguishing between hits and false alarms, namely thesensitivityor

discrimi-nabilityindex [30]. In our study we found a very low discriminability index (d’= .054) and a

(8)

Response trajectories. Ample mouse trajectories are generally interpreted as an index of hesitation in providing responses, whereas relatively straight lines suggest a high degree of cer-tainty. To understand how our participants provide responses, we compared the hand move-ments for correct and for incorrect responses for both self-identified gay and heterosexual speakers. For the analyses, trajectory coordinates were re-scaled into a standard coordinate space (to-left = [1,1.5] and bottom-right = [1,1.5]) and were remapped rightward to allow di-rect comparisons [27]. The coordinates of the trajectories were time normalized by re-sampling the original vector into 101 time-steps using linear interpolation to average across trials.

Spatial attraction–Correct vs. incorrect responses for heterosexual vs. gay speakers. As an index of the mouse attraction toward the category, we computed the Area Under Curve (AUC; [27]), which is the area between the observed trajectory and an idealized straight-line trajectory connecting the starting point of the movement with the clicked label. Both correct and incorrect responses for heterosexual and gay voices were considered. Mean trajectories are reported inFig 1.

A mixed-effects model with AUC as dependent variable and Speakers (heterosexual vs. gay) and Response (correct vs. incorrect) as predictor was performed. Participants and stimuli were treated as random factors. The models were fitted using thelmerfunction (lmerTest package version 1.0) in R software (Version 3.1.0).

The model showed a significant interaction between Speakers and Response (β= 1.02, st. err. = 0.21,t= 4.68,p<_{.001). Further inspection of the interaction showed that, when correctly}

categorized, the AUC for heterosexual speakers (M= .95,SD= .68) was smaller than the one for gay speakers (M= 1.53,SD= 1.11,p= .001). The opposite tendency emerged for the incor-rect responses: The AUC was larger for heterosexual (M= 1.41,SD= 1.21) than for gay speak-ers (M= .98,SD= .71;p<_{.001). Also, whereas AUC for correct responses was smaller than for}

incorrect responses in the case of heterosexuals speakers (p= .001), the contrary emerged for gay speakers. In this case incorrect answers showed a smaller AUC than the correct ones (p<

.001). These results suggest that when participants categorized heterosexual speakers correctly they did it without uncertainty, whereas when they categorized gay speakers correctly they were somehow attracted by the incorrect choice (i.e., heterosexual), suggesting that they experienced ambivalence.

A further analysis was run to ascertain that the difference between trajectories in the differ-ent conditions was due to a continuous attraction of all the trials toward the opposite category, and not to the presence of a subpopulation of trials in which participants made discrete move-ments toward the unselected option while, at a certain point, reversing course toward the cor-rect choice. To address the issue the bimodality coefficient (b) for the AUC distribution was calculated (for technical details on the formula, see [27]); if the coefficient was larger than .55, the distribution is considered bimodal. In the case of correct answers, both the coefficient for the distribution in the heterosexual condition (b = 0.49) and the coefficient for the AUC-distribution in the gay condition (b = 0.36) did not exceed the critical value, so we can reject the bimodality hypothesis. The same was true for the incorrect answers (for heterosexual speakers: b = .37; for gay speakers: b = .43)

Together, the mouse tracking data show that participants were least hesitant in providing responses when correctly identifying heterosexual speakers, but they experienced the highest degree of hesitation when correctly identifying gay speakers.

(9)

and, for many of these, this was different from chance. Four of the 10 heterosexual speakers were consistently recognized as heterosexual and none was misidentified as gay. Among the gay speakers only 1 was consistently identified as gay, whereas 4 were misidentified as hetero-sexual (Fig 2).

Although agreement between participants' judgments and speakers' self-categorizations was low, there was a considerable agreement among the judgments provided by participants. We therefore decided not to consider participants' responses in terms of correctness but to focus

Fig 1. Experiment 1A (Italians): Mean mouse trajectories for correct and incorrect responses for heterosexual and gay speakers.

(10)

on how they perceived the speakers' SO. We thus re-coded all speakers as beingperceivedas heterosexual or gay by calculating agayness rating, ranging from 0 to 1 and defined as the pro-portion of times the speaker was categorized as gay. The higher the gayness score, the higher the likelihood that the speaker is perceived as gay. To test whether listeners coherently group some speakers on the basis of perceived (not self-identified) SO, a k-means cluster analysis on the gayness values was performed setting k = 2. The analysis revealed that speakers could be grouped in 2 clusters of 14 heterosexual-sounding speakers (gayness rating value of .24) and 6 gay-sounding speakers (gayness rating mean value of .65), respectively; the assumption of two

Fig 2. Italian listeners & Italian speakers. Listeners’perception in terms of gayness score (2A) and ratings (2B).Different colors indicate the two clusters based on listeners' perception. Stars (and points) indicate speakers that were significantly perceived either as heterosexual or gay (.<_.1;_*<_.05;_** <_.01;_***<_{.001). Higher values of the y-axis indicate perceived gayity, lower values perceived heterosexuality; .5 represents chance level.}

(11)

clusters fit quite well with the observed data, as shown by the fact that this division accounted for 83% of data variability. The decision to have two clusters was motivated by the following reasoning: First, we were looking at two distinct groups based on sexual orientation such as sounding gay and sounding heterosexual speakers. Second, there is no numerical support for a three-cluster grouping, as shown by the fact that the imposition of three clusters instead of two produced only very small increase in explained variance (only 6%). Gayness values and clusters are reported inFig 2.

These results show that listeners consistently categorized some speakers as gays or as hetero-sexuals on the basis of what they sounded like. Note that this is largely independent of the SO indicated by the speakers themselves. Moreover, the distribution of gay- and heterosexual-sounding speakers was not coherent with the real (50:50) distribution of voices. The majority of the speakers were perceived as sounding heterosexual whereas only a minority was consid-ered as sounding gay.

Experiment 1B

Dichotomous categorization via mouse tracking

–

German sample

Experiment 1B was identical to the previous one, except for 3 aspects: 1) it was run in Germany and hence in German language. 2) It included 12 (rather than 20) speakers, half of which self-identified gays, half self-self-identified heterosexuals. 3) Instructions were varied such that half of the participants were informed about the fact that half of the speakers were heterosexual and half gay. Each of these differences will be explained below. All other aspects of the materials, procedure and analyses were identical to Experiment 1A.

Method

Participants. Forty-eight university students took part in the experiment in the role of lis-teners. Four participants were excluded from analyses because they identified as gays or bisexu-als. The final sample consisted of 44 heterosexual participants (22 females,Mage= 26.95, SD= 4.97) who were German (n = 39) or had lived in Germany for more than 7 years (n = 5). Note that analysis excluding those participants who self-identified as non-German but regular-ly lived in Germany for a consistent period of time did showed the same pattern of results.

Speakers. Twelve speakers were recorded, including six identified gays and 6 self-identified heterosexuals. Their age varied between 20 and 40, however, no age difference emerged between self-identified gay (M= 28.50,SD= 3.02) and self-identified heterosexual speakers (M= 28.17,SD= 7.13),t(10) = .10,p= .92. All speakers lived in North Rhine-West-phalia and spoke standard German.

Sentences. Twelve recorded sentences were used as stimuli. These were the first sentences from the list used in Study 1A. They were all translated from Italian into German and had a similar length (5–10 words, mean length = 7.00,SD= 1.48).

(12)

inaccuracy in Study 1 had been caused by an unexpected distribution, then this difficulty should disappear when people know the actual (50:50) distribution. Contrary to this logic, no difference was found between informed and uninformed participants. Regardless of whether or not participants knew the actual distribution of gay and heterosexual speakers, participants’

ability to identify gay voices did not differ from chances (no info: 46% and 50:50 info: 50%), whereas heterosexual voices were identified reliably above chance in both conditions (no info: 67% and 50:50 info: 64%). Hence, this variable will be no further discussed. Importantly, how-ever, the lack of a difference between the two conditions proves that differential baseline as-sumptions cannot account for the inaccuracy in participants’responses.

Results

Sexual orientation–Response accuracy. On average, self-identified heterosexual speakers were correctly recognized in the majority of cases (66%), whereas self-identified gay speakers were not (48% of correct identifications). Correct identifications exceeded chance for self-iden-tified heterosexual (chi-square = 24.60,p<.001), but not for self-identified gay speakers

(chi-square = .33,p= .56). This pattern suggests a clear difficulty in identifying SO of gay speakers (seeS1 Tablefor statistics for each speaker), which is mainly due to the overwhelming tendency to identify our male speakers as heterosexual. In fact, applying a signal detection approach, par-ticipants showed a strong response bias (β= 1), but a low discriminability index (d’= .20). As in Experiment 1A, we proceeded to examine first the participants' mouse trajectories, and then participants' perceived SO for each speaker.

Response trajectories. The same analyses as in Study 1A were conducted comparing cor-rect and incorcor-rect responses for self-identified heterosexual and gay speakers.

Spatial attraction–Correct vs. incorrect recognition of heterosexual vs. gay speakers. Both correct and incorrect responses for heterosexual and gay voices were considered using the AUC as index of the mouse attraction toward the category. Mean trajectories for conditions are reported inFig 3.

A mixed effects model with AUC as dependent variable and Speakers (heterosexual vs. gay) and Response (correct vs. incorrect) as predictors was run. Participants and items were treated as random factors. The model showed a significant interaction between Speakers and Response (β= 0.76, SE. = 0.19,t= 4.01,p<.001). Further inspection of the interaction showed that AUC

was smaller for correct identifications of heterosexual (M= .79,SD= .68) than gay speakers (M= 1.15,SD= 1.04;p= .002), whereas it was larger for incorrect identifications of heterosexu-al (M= 1.36,SD= 1.17) than of gay speakers (M= .90,SD= 1.12;p= .006). Moreover, for het-erosexual speakers the AUC was smaller for the correct than for the incorrect responses (p<

.001), whereas for correct and incorrect responses in categorization of gay speakers the differ-ence only approached significance (p= .07). To ascertain the reliability of the results, we calcu-lated the bimodality coefficient (b) for the two AUC distributions. For correct answers, bimodality can be excluded given that neither the coefficient for heterosexuals (b = .47) nor that for gays (b = .44) exceeded the threshold of .55. As no difference emerged for the incorrect responses, no bimodality coefficient was examined.

Taken together, findings were similar to those obtained on Italian participants. Again, partici-pants showed greater hesitation when providing incorrect (rather than correct) responses. They also showed greater hesitation when misidentifying heterosexual speakers as gay than when cor-rectly identifying them as heterosexual. This time, no reliable differences emerged for gay speakers.

(13)

speakers were correctly identified as heterosexual and one was incorrectly perceived as gay. Of the gay speakers, two were misidentified as heterosexual and only one was correctly identified as gay.

We again calculated the gayness score and tested whether listeners categorized speakers in two groups (i.e., gay- vs. heterosexual-sounding) depending on how voices were perceived. A K-means cluster analysis on the gayness values was performed setting k = 2. We found that a 2 clusters solution explained 74.2% of the variance. Eight speakers were included in the first

Fig 3. Experiment 1B (Germans): Mean mouse trajectories for correct and incorrect responses for heterosexual and gay speakers.

(14)

cluster (mean value of .29) whereas the remaining 4 were included in the second cluster (mean value of .65). Gayness values and clusters for each speaker are reported inFig 4.

Experiment 2A

Within-language judgments on continuous scales

–

Italian participants

Our first set of studies showed that Italian and German listeners were not very accurate in judg-ing SO of speakers of their own native language. They generally tended to categorize speakers

Fig 4. German listeners & German speakers. Listeners’perception in terms of gaynessscore (4A) and ratings (4B).Different colors indicate the two clusters based on listeners’perception. Stars (and points) indicate speakers that were significantly perceived either as heterosexual or gay (.<_.1;_*<_.05;_** <.01;_***<.001). Higher values of the y-axis indicate perceived gayity, lower values perceived heterosexuality; .5 represents chance level.

(15)

as heterosexuals and in both samples only one gay speaker was reliably identified as gay. Even when they correctly categorized the self-identified gay speakers, they tended to be attracted to-wards the opposite response (“heterosexuals”), suggesting a certain degree of uncertainty, at least in the Italian sample. Although indicative of inaccuracy of the voice-based categorization of SO process, this may be a function of the forced choice format that, combined with a strict time limit, may have made the task difficult. In studies 2A and 2B we therefore asked partici-pants to rate speakers' SO on a scale that allowed them to modulate their answers without time limit (see [3], for a similar procedure). Thus, listeners were given the possibility to mold their judgments without requiring a dichotomous decision and without imposing a response time limit. An additional advantage of this method is that speakers’self-definitions and listeners’

perception of SO were assessed on identical scales that allow a direct comparison between the two.

Method

Participants. Thirty university students (15 female,Mage= 20.97,SD= 2.37) took part in the experiment in the role of listeners. All participants were Italian native speakers.

Materials. Voice samples and sentences were the same as in Experiments 1A.

Procedure. Participants were tested individually. In the experiment they performed a computer task, in which, for each trial, participants listened to a spoken sentence through headphones. In one block, participants had to judge the speaker's SO on a scale from 1 (completely heterosexual) to 7 (completely homosexual). In another block, among other dimen-sions, participants evaluated the speakers' masculinity on a scale from 1 (completely masculine) to 7 (completely feminine). As this research aims at studying the perception of sexual orienta-tion, we do not further consider other dimensions (e.g., age) that have been measured (see also

S2 Table).

Results

Analogous to the procedure in the previous experiments, we first tested whether participants were generally able to distinguish heterosexual from gay speakers. Then, we tested how listen-ers perceived speaklisten-ers' SO by verifying whether there was any consistency in the way partici-pants judged the voices.

Sexual orientation–Response correctness. To test for accuracy, different analyses were performed. First, we tested the relative accuracy of the participants’guesses. To do so, we adopted a method that is roughly based on Cadinu and Rothbart’s [31]“within-participants correlations”methodology (see also [32]). We calculated for each participant the correlation between his/her ratings (perceived SO) and the speakers’self-identified orientation. Different from Cadinu and collaborators [31,32], we used Spearman correlations because the self-re-ported SO was not distributed normally across speakers. The within-participants correlations were then z-tansformed and treated as dependent variables and subjected to a one-sample t-test. Positive correlations that deviated reliably from zero suggest that the participants’ratings were above chance. The average within-participants correlations (M= .158,SD. = .21) was in-deed positive and deviated reliably from chance,t(28) = 4.06,p<_{001, suggesting that}

partici-pants were to some degree able to identify the speakers’SO. At the same time, the small entity of the correlation suggests that the ability to guess the speakers’SO, though slightly above chance, was all but perfect.

(16)

self-identified heterosexual speakers (M= 2.50,SD= .58), (t(29) = 6.39,p<.001), suggesting,

again, that participants were, to some degree, able to distinguish the two groups from voice alone. However, both means were reliably below the scale mid-point (one-samplets>4.50

andps<_{.001), showing that both self-identified heterosexual and self-identified gay speakers}

tended to be rated as heterosexual. These findings demonstrate again that participants catego-rized speakers mainly as heterosexual. Moreover, there was high variability on accuracy ratings across speakers (see below,Fig 1andS3 Tablefor statistics for each speaker).

Perceived sexual orientation. In order to test whether each speaker had received consis-tent ratings across participants regardless of his actual SO, we looked at each speaker separate-ly, using perceived SO as dependent variable and comparing the mean ratings to the neutral scale midpoint.

The data reported in the lower portion ofFig 2show that the majority of the speakers were perceived as heterosexual (including all but one heterosexual speakers and six out of ten gay speakers). Only two speakers were consistently perceived as gay and both of these were self-identified gays.

Moreover, as in Experiment 1A, we looked at whether speakers could be divided into two groups (heterosexual-sounding vs. gay-sounding) on the basis of participants’judgment. To this aim, we ran a k-means cluster analysis on the mean ratings setting k = 2. The two clusters were identified as follows: The first group was composed of 15 speakers, with a mean value of 2.3; the second group was composed of 5 speakers (seeFig 2), with a mean value of 4.3; the two clusters accounted for 68% of data variability. Although the imposition of three clusters would produce an increase of 19% of the explained variance, we were still looking to distinguish two groups of speakers as we also asked participants to do so. Note that the pattern shown by the cluster analysis was similar to that of Experiment 1A. Excluding those speakers that were judged at chance level in both experiments, 15 out of 18 speakers were collocated in the same clusters as in Experiment 1A.

Correlation between perceived sexual orientation and masculinity. A correlation was run to explore the link between perceived SO and masculinity. The analysis revealed a strong correlation between the two measures (Spearman correlation,ρ= -.59,p= .009): The more the

speakers were perceived as masculine, the less they were judged as gay.

Experiment 2B

Within-language judgments on continuous scales

–

German participants

This experiment was identical to study 1B. However, in this case the response format consisted of a 7 point Likert-type scale like in Study 2A and participants were asked to evaluate both speaker's SO and masculinity.

Method

Participants. Thirty-six university students took part in the experiment in the role of lis-teners. Four participants were excluded from analyses because they identified as gays or bisexu-als. The final sample consisted of 32 heterosexual participants (16 female,Mage= 25.75, SD= 6.96) who were German or spoke perfectly German.

Results

(17)

magnitude (M =.127,SD= .33), one-samplet(34) = 2.31,p= .027. Looking at the mean SO rat-ings, data suggest that on average, self-identified heterosexual speakers (M= 2.71,SD= .84) were judged as more heterosexual than gay speakers (M= 3.12,SD= 1.11), (t(31) = 2.22,p= .03), suggesting that, to some degree, participants were able to grasp differences in SO. Howev-er, as in Experiment 2A both means were significantly below the mid-point of the scale (both ts>_4.76,_ps<_{.001), attesting to the fact that participants tended to perceive speakers as}

hetero-sexual (seeS2 Tablefor statistics for each speaker).

Perceived sexual orientation. As illustrated in the lower portion ofFig 4, participants had difficulties in detecting gay speakers: The majority of speakers were perceived as sounding het-erosexual. Comparing mean ratings to the scale midpoint, of the 6 heterosexual speakers, 5 were correctly perceived as heterosexual and none was falsely classified as gay. Of the 6 gay speakers, only 1 was reliably identified as gay and 3 were falsely classified as heterosexual. Rat-ings for the remaining speakers did not differ from the scale midpoint.

We further analyzed whether speakers could be divided into two groups (i.e., gay-sounding-vs. heterosexual-sounding) according to listeners' perception, regardless of their self-identified SO. The k-means cluster analysis on the mean rated values (and setting k = 2) yielded two clus-ters assigning 8 speakers to the first (mean value of 2.50) and 4 speakers to the second cluster (mean value of 3.80). The two clusters accounted for 77% of data variability. The analyses sug-gest that some speakers were strongly identified as heterosexual, whereas other speakers tended to be perceived as less heterosexual and were located around the scale midpoint. The results of the cluster analysis are quite similar to those reported in Experiment 1B: Ten out of the 12 speakers were located in the same cluster in the two experiments.

Correlation between sexual orientation and masculinity. As for Italians, we explored the relation between the perception of speakers’SO and masculinity. Analysis revealed no reliable relation between perceived SO and masculinity (r= -.46,p= .12), although the magnitude of the correlation was only slightly smaller than that of the Italian sample.

Acoustic Analyses

All four studies suggest that people are rather inaccurate in identifying the SO of speakers on the basis of voice alone and that heterosexuality serves as a default option even when the actual 50:50 distribution is known. Interestingly, however, people do not respond in a random fashion but associate certain voices with heterosexuality and others with gayness. The question there-fore arises whether voices carry information, independent of the self-defined SO of the speaker. What makes some voices appear gay and others heterosexual? To answer this research ques-tion, we explored the relation between the participants' perception of speakers' SO and a sub-set of acoustic properties of the speech signal, with the aim to identify which acoustic cues Ital-ian and German participants exploited to perform their judgments. Moreover, since, at least for Italian, the perception of speaker's SO seems to be related to that of speaker's masculinity, we also explored the relation between the latter and the speech signal.

(18)

skewness, and kurtosis. Acoustic measures were made using the PRAAT software [28]. The onset and offset of each phoneme of interest in each word was marked in PRAAT by a coder. All acous-tic analyses were done automaacous-tically in PRAAT using custom-written scripts, which made reference to these labels. Formants were measured at vowel midpoint. Acoustic measures were extracted by a sub-set of all recorded materials (20 tokens for each sound in Italian and German, respectively; Italian vowels: /a/, /e/, /i/, /o/, /u/; German vowels: /a/, /a:/, /e:/, /ε/, /ε:/, /i/, /I/, /i:/, /o/, /o:/, /u:/; only stressed tokens were selected). Finally, we considered the speaking rate (mea-sured as the ratio between total sentence duration and the number of syllables in the sentence).

To explore whether and what acoustic features participants related to heterosexual- vs. gay-sounding voice, correlation analyses were run between the listeners' judgments and all acoustic measures. Then, to explore whether the same cues are also related to the masculine- vs. feminine-sounding voices, the same acoustic analyses were run between acoustic cues and ratings of mas-culinity. Pearson correlations were adopted when both measures in the correlation had a normal distribution; in all the other cases Spearman correlations were run. Finally, we tested whether speakers perceived as gays speak differently from those perceived as heterosexual. Hence, we con-sidered heterosexual- and gay-sounding speakers (based on cluster analyses reported above) and compared whether the two groups of speakers differ on acoustic cues. Analyses were performed only for those acoustic features that were found to be significantly related with listeners' ratings.

Italians

—

Experiment 1A (dichotomous choice)

Since the duration measures of all vowels were highly correlated with each other, we calculated a vowel duration index as a global measure of all vowel durations (computed by averaging the duration of all vowels). The new measure was highly correlated with all single measures (/a/: r =.93; /e/:r =.94; /i/:r =.98; /o/:r =.94;/u/:r =.68) and can therefore be considered a valid compound measure of vowel duration. The mean vowel duration, however, did not strongly correlate with speaking rate (r =.53) and /s/ duration (r =.48), suggesting that these are better considered separately.Table 1reports only the significant correlations between the gayness score on one side and the frequency parameters and the duration measures on the other (for the full list of correlations for this and the following experiments, seeS4 Table).

Speakers with higher values of gayness (thus, more likely to be perceived as gay) tended to speak slower, and to produce longer vowels than speakers with lower values of gayness. To ascertain whether both speaking rate and vowel duration contribute to the perception of SO a regression analysis was run with duration of vowels, duration of /s/, and speaking rate as pre-dictors. Since such measures were correlated to each other7, residuals were entered as predic-tors in the regression analysis (see, e.g., [33,34]). Results show that only speaking rate (β= -1.12, st. err. = .11, t = -2.99, p = .009)significantly predicted listeners’ratings of speakers’SO (mean vowel duration: t = 1.09, p>.2; /s/ duration: t<1, p>.7). Moreover, a higher gayness

was associated with a higher mean frequencies of /s/, and a higher F2 for the vowels /a/ and /e/. We then tested whether clusters of speakers perceived as heterosexual vs. gay show differ-ences in those acoustic features that significantly correlated with listeners' ratings. The analysis shows that speakers in the two groups differed only with respect to the center of gravity (Mann-Whitney test,W= 73,p= .01; all otherps>_.9).

Italians

—

Experiment 2A, sexual orientation (Likert scale)

The same acoustic features tested in Experiment 1A were correlated with the participants’

(19)

the listeners' rating as dependent variable and the duration of vowels and speaking rate as pre-dictors showed that speaking rate (β= -1.13, st. err. = .51, t = -3.38, p = .004), but not mean vowel duration (β= .73, st. err. = .62, t = 2.21, p = .04) predict the perceived SO. Moreover, speakers perceived as more gay had higher mean frequencies of /s/, and a higher F1 for vowels /i/.

As for Experiment 1A, speakers were divided in two groups (perceived gays and perceived heterosexual) according to the cluster analysis reported earlier. The comparison of the acoustic features of speakers in the two groups shows that they only differed in terms of center of gravity (Mann-Whitney test,W= 62,p= .03; all otherps>.9).

In both Experiments 1A and 2A, participants used speaking rate, mean vowel duration, and /s/ center of gravity to perform their judgment; some of these cues were consistently used in the two studies, suggesting that the way of speaking may be related to the perception of speak-ers’SO. The conclusion is further strengthened by the comparison of the speech features of speakers categorized as members of different groups: In both experiments, the /s/ center of gravity is a discriminant for the categorization of a speaker as gay (in case of higher mean fre-quencies of /s/) or heterosexual (in case of lower mean frefre-quencies of /s/).

Italians

—

Experiment 2A, masculinity (Likert scale)

The correlation analyses between acoustic cues and participants’mean rating show that partici-pants tended to rate speakers as more feminine the higher their mean frequencies of /s/ and the

Table 1. Italians–Significant correlations between acoustic cues and participants' judgments of Experiment 1A, 2A, and self-reported speakers' ratings.

Sexual Orientation Measure of Masculinity

Experiment 1A Experiment 2A Italian Speakers Experiment 2A

Acoustic measures

Correlation with Gayness rating

p Correlation with listeners' rating

p Correlation with self rating

p

Vowel F1

vowel /i/ — — ρ= .48 * ρ= .46 * — —

vowel /u/ — — — — ρ= .51 * — —

Vowel F2

vowel /a/ ρ= .55 * — — — — — —

vowel /e/ ρ= .67 ** — — — — — —

/s/ center of

gravity ρ= .57 * ρ= .52 * — — r = .46 *

/s/ skewness — — — — — — r = .46 *

Duration measures

Speaking rate ρ= .71 *** ρ= .53 * — — — —

Mean vowel

duration ρ= .54 * ρ= .46 * ρ= .46 * — —

/s/ duration ρ= .44 * — — — — —

-Note: Positive correlations indicate that the vocal cue is associated with greater perceived gayity. If one of the studies produced signiﬁcant correlations for a given cue, we also report in italics any correlations in the other study up to p = .2;pis the p value:. .2;

*<_.05;

**<_.01;

***<_.001.

(20)

higher their values of /s/ skewness (significant correlation inTable 1). Interestingly, the mean frequencies of /s/ was found to be related also to the judgment of speakers' SO.

German

—

Experiment 1B (dichotomous choice)

The same acoustic features and correlation analyses of Experiment 1A were run with the Ger-man sample. Significant correlations are reported inTable 2. Our findings, again, suggest that speakers’judgments are driven by specific acoustic features. In particular, speakers who pro-duced longer sounds tended to be perceived as more gay-sounding; moreover, the gayness score was associated with a higher F2 in some frontal vowels.

To further test how the gayness ratings work, the speakers were divided into two groups (perceived gays and perceived heterosexuals) according to the cluster analysis reported above.

Table 2. Germans–Significant correlations between acoustic cues and participants' judgments of Experiment 1B, 2B, and self-reported speakers' ratings.

Sexual Orientation Measure of Masculinity

Experiment 1B Experiment 2B Italian Speaker Experiment 2B

Acoustic measures

Correlation with Gayness rating

p Correlation with self-rating

p

Vowel F0

vowel /a/ — — — — — — r = -.77 **

vowel /a:/ — — — — — — r = -.80 **

vowel /e/ — — — — — — r = -76. *

vowel /i/ — — — — — — r = -60 *

vowel /i:/ — — — — — — r = -66 *

vowel /o/ — — — — — — r = -74 *

vowel /u/ — — — — — — r = -77 *

Vowel F1

vowel /a/ — — — — ρ= .66 * — —

vowel /a:/ — — — — ρ= .75 * — —

vowel /ε/ — — — — ρ= .80 *** — —

Vowel F2

vowel /a/ r = .44 . r = .62 * — — — —

vowel /a:/ — — — — — — — —

vowel /e:/ r = .59 * r = .48 . — — — —

vowel /ε/ r = .70 * r = .73 ** — — — —

vowel /ɪ/ r = .62 _* r = .63 _* _— _— _— _—

Duration measures

vowel /e:/ ρ= .72 * ρ= .54 . — — — —

vowel /i:/ r = .60 * r = .44 . — — — —

vowel /u:/ r = .67 * r = .57 . — — — —

/s/ duration r = .66 * — — — — — —

Note: Positive correlations indicate that the vocal cue is associated with greater perceived gayity. If one of the studies produced signiﬁcant correlations for

a given cue, we also report in italics any correlations in the other study up to p = .2;pis the p value:. .2; *<.05;

**<.01;

***<.001.

(21)

Acoustic measures of speakers belonging to the two groups were compared. The analysis shows significant differences for the F2 of /e:/ and /ε/ (Mann-Whitney test,W= 4,p= .03, and W= 3,p= .02, respectively; all otherps>.09).

German

—

Experiment 2B, sexual orientation (Likert scale)

Correlation analyses were run between the listeners' mean ratings and the same acoustic fea-tures used in Experiment 1B. Significant correlations are reported inTable 2(for the full list of correlations, seeS3 Table). Speakers who produced higher F2 of some centro-frontal vowels tended to be perceived as more gay-sounding. Also, vowel duration seems to play again some role, as suggested by the correlation between duration of, e.g., /u:/ and listeners ratings. More-over, the comparison of acoustic measures of speakers belonging to the two cluster identified through the cluster analysis above show significant differences for the F2 of /a/ and /ε/ (Mann-Whitney test,W= 28,p= .05, andW= 28,p= .05, respectively; all otherps>.1).

The results of Experiments 1B and 2B partly overlap: In both cases, German participants tended to perceive the speakers' homosexuality based on the F2 height of some vowels plus and on durational measures, suggesting that the way of speaking is related to the perception of speakers' SO. Moreover, F2 measures were also found to differentiate the speech of speakers categorized as gay from those categorized as heterosexual, with the former showing higher fre-quency values than the latter.

German

—

Experiment 2B, masculinity (Likert scale)

The correlation analyses between acoustic cues and participants’mean rating showed that fem-ininity was associated with higher values of vowels F0 (significant correlation inTable 2). There was no overlap between the cues related to masculinity and those related to SO.

To sum up, the correlations between acoustic features and listeners' judgments of SO reveal some within-language consistency: Italian participants perform the categorization using mean vowel duration and speaking rate, plus the center of gravity of /s/; the latter measure clearly dif-ferentiates the speakers categorized as heterosexual from those categorized as gay, suggesting that gayness ratings are highly driven by these features. In a different way, Germans base their categorization on some duration measures (e.g., /u:/ duration) and on a phonetic property of some frontal vowels (i.e., F2 of /ε/, /I/, /a/). Among these measures, F2 values seem to be partic-ularly relevant for German listeners, as shown by the fact that speakers in the gay cluster have higher F2 values than those in the heterosexual cluster.

It is worth nothing that in both languages and across experiments, durational measures seem to play an important role in driving listeners' perception, whereas the relation between formant frequencies and listeners' perception seems to be stable only in German. Across exper-iments, we also found sporadic effects for some acoustic features (e.g., F2 of /a/ in Experiment 1A, but not in 2A; duration of /s/ in Experiment 1B, but not in 2B). Although such complex patterns suggest that participants' judgments are based on multiple criteria, these findings ought to be interpreted with caution, given the limited number of available speakers (20 Ital-ians and 12 Germans) and the non-representative sample of sentences.

Self-identification of Italian and German speakers

(22)

The same acoustic features as those tested with listeners were used. Note that speakers re-ported their SO on the same Likert scale that participants used in Experiments 2A and 2B. Sig-nificant correlations for Italian and German speakers are reported in Tables1and

2, respectively.

Italian speakers who self-identified more strongly as gay were also found to produce longer vowels and higher F1 of some vowels (/i/ and /u/). Differences in terms of F1 were also found for German speakers: the more they self-identified as gay the higher F1 they produced for some vowels (/a/, /a:/ and /ε/). Overall, SO of Italian and German speakers did not show the same relations with acoustic cues, suggesting that SO is expressed through distinct, language-specific cues. Moreover, for Italians there is partial correspondence between the acoustic cues related to self-defined and perceived SO, at least in terms of vowel duration. It seems that producing lon-ger vowels and speaking somewhat slower is a feature of gay Italian speakers that is also used by listeners in judging SO from voice. For Germans, no correspondence was found between the acoustic cues related to self-identified and those related to perceived SO.

Discussion

Together, the four experiments show a straightforward pattern of results: Participants are not particularly accurate in categorizing the SO of the speakers they heard. However, there is con-verging evidence that listeners made distinctions among speakers, as they consistently catego-rized some of them as gay and others as heterosexual, even where this perception did not correspond to the speaker’s actual SO. Moreover, as ratings (Study 2A and 2B) and mouse tra-jectories (Study 1A and 1B) show, listeners seemed to consider heterosexuality as a reference level. Across studies, approximately 2/3 of heterosexual speakers were correctly identified as heterosexual, whereas one was identified as gay. Among gay speakers, less than 20% were iden-tified as such and close to half of the gay speakers were erroneously perceived as heterosexual. Thus, a small group of speakers seemed to deviate from the typical heterosexual-sounding voice and were therefore judged to be gay, whereas the majority of speakers were classified as heterosexual. From a methodological point of view it is interesting to note that similar result patterns emerged when participants were asked to perform a dichotomous choice and when asked to rate speakers on a Likert scale, suggesting that the type of judgment does not modify the perception of speakers' SO.

(23)

Interestingly, also /s/ center of gravity and formant frequency have been shown to be affected by social variation, suggesting that their features can be socially acquired and modified (on this issue, see, e.g., [37,38])

The exploratory nature of our acoustic analyses does not allow us to conclude with certainty that longer duration and higher (second) formant frequencies are predictive of listeners’ judg-ments. However, we can state that relations between these measures emerged. It is worth not-ing that previous research has found differences between male and female speakers across languages that seem to resemble the current results. Indeed, males produce shorter vowel dura-tion than females [39,40]. Moreover, higher F2 values are reported in female compared to male speech (e.g., [4]). This suggests that listeners may, at least in part, derive their judgment of speaker’s SO from their knowledge of typical male vs. female speech styles. In line with this interpretation, the ratings of masculinity show that, at least for Italians, the perceived SO is re-lated to perceived masculinity and there is some overlap in the speech cues listeners use to per-form both the judgments. Surprisingly, this was not the case for Germans (for further

discussion on this issue, seeGeneral Discussion).

To sum up, the results of the first 4 experiments showed that: a) the voice-based categoriza-tion of SO is fairly inaccurate; b) listeners tend to consistently categorize speakers’SO on the basis of voice sound although this often does not reflect speakers’self-identified SO; c) judg-ments are related to different types of acoustic cues, and although some cues are used in both languages (e.g., vowel duration), there is large variability across languages; d) at least for the languages under consideration, the categorization process is notlanguage-dependentas it emerges in similar ways in distinct linguistic and cultural contexts; and e) it does not depend on the type of measure that is used. As a matter of fact, speakers were judged in a very similar way in Studies 1 and 2 (correlation between Italian samples of Experiment 1A and 2A:r(20) = .83,p<.001; correlation between German samples of Experiment 1B and 2B:r(12) = .67,p=

.02).

Experiment 3

Cross-language categorization

The last experiment was designed to address thelanguage-specificityissue. We tested whether Italian and German listeners show a similar pattern when categorizing the SO of speakers of their own vs. foreign language. Thus, besides replicating previous findings, Experiment 3 tested how listeners categorize foreign speakers and how they make their judgments in such cross-language categorization task.

Method

Participants. One hundred and sixteen university students took part in this study. Eighty-six were recruited in Italy and 30 in Germany. Five participants reported to be gay or bisexual, two did not indicate their SO and two reported to be non-native speakers and thus were ex-cluded from analyses. The final sample consisted of 107 heterosexual participants (79 Italians and 28 Germans; 35 males). Italian (Mage= 22.25,SD= 3.14) and German participants (Mage= 23.54,SD= 3.84) had similar age. The majority of them (84.1%) had no knowledge of the for-eign language under consideration, whereas a minority (15.9%) reported to know some Ger-man/Italian but mainly at a very basic level. Analyses excluding participants with basic knowledge of the foreign language did not change the pattern of results.

Materials. Speakers were the same as in Experiments 1 and 2.

(24)

sentences pronounced by 20 Italian speakers (10 self-identified as gay and 10 as heterosexual), and the other block included sentences by 12 German speakers (6 self-identified as gay and 6 as heterosexual). Each speaker pronounced the same sentence:“Il cane correva nel parco/Der Hund rannte durch den Park”[the dog was running in the park]. We chose this sentence be-cause its length and structure is almost identical in the two languages. Participants heard first speakers of their native language and then those of the foreign language.

Using the same procedure as in Study 1B, Italian participants were randomly assigned to two conditions: they were either told initially that half of the 20 Italian and of the 12 German speakers were gay and half straight, or they received no information. Given the small sample of German participants, they were all assigned to the no-information condition.

After listening to each speaker, participants had to rate his SO on a Likert scale (from 1 = completely heterosexual to 6 = completely homosexual). Differently from Study 2, we used a 6-point scale, eliminating the ambiguous scale midpoint.

At the end of the experiment, participants reported their demographic information (gender, age, native language, and SO). In addition, we asked them to indicate whether, and to which degree, they knew the foreign language used in the experiment.

Results

Preliminary analysis on Italian participants. We first tested whether the distribution in-formation had any effect on Italian participants. Knowing the 50:50 distribution beforehand overall increased the gayness ratings from 3.13 in the no information condition to 3.40 in the 50:50 information condition,F(1,84) = 6.00,p<_.016,η_p2= .07. Thus, knowing the distribution increased listeners’subjective likelihood that a speaker may be gay. However, it did not increase the accuracy in any way as evidenced by the complete absence of an interaction between distri-bution information and speaker’s sexual orientation (F= 1.3,p>.25). We therefore compared

Italian and German participants in all subsequent analyses without considering this factor any further.

Sexual orientation–Response accuracy for same vs. different language speakers. For ex-ploratory purposes, we first simply looked at the percentage of speakers whose mean ratings fell on the heterosexual or on the gay pole of the scale (see [3], for a similar procedure). Overall, 81% of heterosexual speakers were collocated on the“correct side of the scale”(below 3.5), a percentage that reliably exceeded chance, binomial,p= .021, suggesting that the majority of lis-teners tended to judge them as heterosexual; importantly, this percentage was identical for judgments by same- and by different-language speakers. Gay speakers were collocated on the

“correct side of the scale”(above 3.5) in only 62% of the cases when speakers and listeners spoke the same language, and in 69% of the cases when speakers and listeners spoke different languages, neither of which exceeded chance (seeS5 Tablefor statistics for each speaker). Thus, overall hit rates were very similar for same- and other-language judgments. However, the more fine-grained analysis reported below suggests a more complex pattern.

Sexual orientation–Ratings for Italians and German speakers as a function of Listeners'