dropping/retention vs right context - Speech technologies as an aide for large-scale linguistic

L-dropping rates for dialogs do not seem to depend on right context In prepared speech, there is a tendency for L-retention, but higher deletion rates seen in right #C context

Supports hypothesis of relevance of the phonetic context on realization of final -l

Breaking the Unwritten Language Barriers

BULB: speech technologies to help field linguists document unwritten languages

Primarily speech recognition and machine translation 3 Bantu languages: Basaa, Myene, Embosi

ANR-DFG funded French-German project Recording tool: Aikuma

Collaboration of speech technologists and linguists

Training workshops (technology for linguists, lingustics for technologists)

Breaking the Unwritten Language Barriers

BULB: speech technologies to help field linguists document unwritten languages

Primarily speech recognition and machine translation 3 Bantu languages: Basaa, Myene, Embosi

ANR-DFG funded French-German project Recording tool: Aikuma

Collaboration of speech technologists and linguists

Training workshops (technology for linguists, lingustics for technologists)

BULB: Lig Aikuma

Smartphone application for data collection (after Bird et al, ACL 2014) https://lig-aikuma.imag.fr

Collects speaker meta data

Supports recording, respeaking (rs) and translation (tr)

(from L. Besacier, CMLD 2018)

BULB: data

Smartphone application for data collection (after Bird et al, ACL 2014) https://lig-aikuma.imag.fr

Collects speaker meta data

Supports recording, respeaking (rs) and translation (tr) Language #hours (rs,tr)

Bassa 31 (23, 33) Emboshi 55 (30, 30) Myene 45 (44, 11)

(from L. Besacier, CMLD 2018)

BULB: data

Smartphone application for data collection (after Bird et al, ACL 2014) https://lig-aikuma.imag.fr

Collects speaker meta data

Supports recording, respeaking (rs) and translation (tr) Language #hours (rs,tr)

Bassa 31 (23, 33) Emboshi 55 (30, 30) Myene 45 (44, 11)

(from L. Besacier, CMLD 2018)

Embosi: vowel/morpheme elision

Investigate vowel elision and morpheme deletion (Cooper-Leavitt et al, Interspeech 2017)

Variant rules for elision after Rialland et al., 2012

Vowel elision at word boundaries: long/short or short/short vowel contact

CV[long,HL]#V[short].CV →CV[short,H]#V.CV

CV[H,low]#V[L,mid].CV→ CV[H,mid]#CV

Compound words introduced to model morpheme deletion

Example: Embosi ya-deletion

. w á m i t ú β á b ɔ s i l a ⁿw á s o k o ⁿdz i .

[silence] wa ámitúβá bɔsi la mwásí ya_okondzi [silence]

Time (s)

0 3.108

Frequency (Hz)

0 5000

Vowel/morpheme elision

Morpheme n_del n_del+ve n¬del Total N

ya 83(35%) 125(52%) 31(13%) 239

mo 0 0 8(100%) 8

ba 0 0 7(100%) 7

ng´a 12(3%) 6(1%) 439(86%) 457

nO 13(8%) 9(6%) 133(86%) 155

wa 17(4%) 14(3%) 431(93%) 462

Pronunciation dictionary permitted vowel or morpheme deletion Vowel elision can be attributed to phonetic or phonological processes ya has highest deletions/vowel elision

ya is most often preserved at the start or end of an utterance (syntactic distribution)

Case Study: FACST

Investigate speech of bilingual French/Arabic speakers

French Algerian Code-switching (CS) Triggered (FACST) corpus (Amazouz et al., LREC 2018)

Bilingual speakers selected based on reponses in online questionnaire about bilingualism, education and CS practice

Conversations with CS triggered by questions in both languages 20 speakers: 10 male/female, aged: 20-40 years

7h30m speech, 15-40 m/speaker, elicited spontaneous & read speech Study and characterize CS

Study phonetic realization of speech by bilinguals French richer vowel inventory than Arabic (IS2018) Arabic has a richer consonant inventory than French

Example of code-switching types

FACST transcription example

Manual segmentation based on language change, breath groups and speaker turns

CS segments less than 30 ms not segmented

Play Stop

Translation:“He was born in 1988. He started working as a pastry cook with my uncle. He was paid 8000 dinars and he used to give this money to my mom; she had no money because she was not working. My uncle helped us much and it is due to their

Aligned transcripts CS

Vowel production by FR-AA bilinguals

Bilingual speakers’ speech presents more acoustic variation than that of monolinguals (Bullock, 2012; Auer, 2010)

Bilinguals access more than one phonemic inventory which may lead to potential interferences (Fricke 2016; Grosjean, 1995)

Vowel inventories of different sizes (French is richer)

To what extent do bilinguals adapt their vowel productions to the linguistic context?

Methodology: use automatic speech alignment to study vowel variants Focus on parallel variants (3 expts allowing vowel substitutions) Frequent replacements of the target vowel by competing vowels are considered an indicator of variation

Vowels in French and Arabic

[Delattre, 1966] [Thelwall & Akram Sa’adeddin, 1999]

Standard French: 11 oral vowels, 4 nasal vowels, 1 schwa Classic Arabic: 3 oral vowels

How does this difference influence speech production in bilingual speakers?

Exp. 1: Vowel variants in French

Populations: French natives vs. bilinguals (French-Algerian Arabic) Language: French

NCCFr [Torreira et al, Speech Communication 2010]

36h of conversational French, 46 speakers (24 female) French acoustic model

Two production variants for each target vowel Vowel Variants Examples

i [e, y] lit (bed): li, le, ly e [E, œ] nez (nose): ne, nEnœ

a [E, œ] chat (cat): Sa,SE,Sœ (anterior)

a [O, œ] Sa,SO,Sœ (posterior)

o [O, ø] chaud (hot): So,SO,Sø u [o, ø] loup (wolf): lu,lo,lø

Exp. 1 - Results

(French: natives vs. bilinguals)

Observed variation is vowel independent

Comparable amount of variation in both groups(French natives, bilinguals)

One exception: for [a] with anterior variants, bilinguals show considerably less variation than French natives

Exp. 2: Vowel variants in code-switching

Population: bilinguals (French-Algerian Arabic) Languages: French, Algerian Arabic

Production variation for three target vowels, each with two variants Vowel production variation in bilinguals as a function of language French acoustic model

Are the realizations of Arabic vowels acoustically close to French vowels?

and if so, which?

Vowel Variants i [e, y]

a [E, œ](anterior)

a [O, œ](posterior)

u [o, ø]

Exp. 2: Results

(bilinguals: code-switching)

The observed variation is vowel dependent

[i]is substituted more often than the other vowels ([a], [u]) [a](post)is least often

substituted

Language also has an impact on vowel variation

in French, the target vowel is more often produced than in Algerian Arabic

this pattern is observed for all target vowels

Exp. 2: Results

(bilinguals: code-switching)

The observed variation is vowel dependent

[i]is substituted more often than the other vowels ([a], [u])

[a](post)is least often substituted

Language also has an impact on vowel variation

in French, the target vowel is more often produced than in Algerian Arabic

this pattern is observed for all target vowels

Exp. 2: Results

(bilinguals: code-switching)

The observed variation is vowel dependent

[i]is substituted more often than the other vowels ([a], [u]) [a](post)is least often

substituted

Language also has an impact on vowel variation

in French, the target vowel is more often produced than in Algerian Arabic

this pattern is observed for all target vowels

Exp. 2: Results

(bilinguals: code-switching)

The observed variation is vowel dependent

[i]is substituted more often than the other vowels ([a], [u]) [a](post)is least often

substituted

Language also has an impact on vowel variation

in French, the target vowel is more often produced than in Algerian Arabic

this pattern is observed for all target vowels

Exp. 3: Vowel centralization

Quantify movement of peripheral vowels towards the center of the vowel triangle

One production variant for each target vowel: schwa [@]

French from natives vs. bilinguals

Exp. 3: Results

(French: vowel centralization)

Vowel FR FR-Alg

i 14.1 12.8

e 20.9 24.4

E 34.1 15.9

a 34.0 15.9

O 39.4 20.2

o 33.5 21.6

u 25.0 16.2

E 13.6 7.7

a 17.5 8.7

˜O 17.7 6.5 Schwa variant rates (%)

In French, vowel centralization is vowel dependent

[O]is most affected by vowel centralization

[˜E]is least affected by centralization

Exp. 3: Results

(French: vowel centralization)

Vowel FR FR-Alg

i 14.1 12.8

e 20.9 24.4

E 34.1 15.9

a 34.0 15.9

O 39.4 20.2

o 33.5 21.6

u 25.0 16.2

E 13.6 7.7

a 17.5 8.7

O 17.7 6.5

Schwa variant rates (%)

In French, vowel centralization is vowel dependent

[O]is most affected by vowel centralization

[˜E]is least affected by centralization

Exp. 3: Results

(French: vowel centralization)

Vowel FR FR-Alg

i 14.1 12.8

e 20.9 24.4

E 34.1 15.9

a 34.0 15.9

O 39.4 20.2

o 33.5 21.6

u 25.0 16.2

E 13.6 7.7

a 17.5 8.7

O 17.7 6.5

Schwa variant rates (%)

In French, vowel centralization is vowel dependent

[O]is most affected by vowel centralization

[˜E]is least affected by centralization

Exp. 3: Results

(Arabic: vowel centralization)

Vowel Reading CS

i 56.5 37.9

i: 15.0 19.7

a 42.4 49.0

a: 26.8 36.4

u 44.7 41.1

u: 24.0 33.0

Schwa variant rates (%)

In Algerian Arabic, vowel centralization is also vowel dependent

Long vowels less subject to centralization

[i:] is less often centralized than the other vowels Speech style i.e. read vs spontaneous CS does not have much impact on vowel

centralization in Algerian Arabic

Exp. 3: Results

(Arabic: vowel centralization)

Vowel Reading CS

i 56.5 37.9

i: 15.0 19.7

a 42.4 49.0

a: 26.8 36.4

u 44.7 41.1

u: 24.0 33.0

Schwa variant rates (%)

In Algerian Arabic, vowel centralization is also vowel dependent

Long vowels less subject to centralization

[i:] is less often centralized than the other vowels

Speech style i.e. read vs spontaneous CS does not have much impact on vowel

centralization in Algerian Arabic

Exp. 3: Results

(Arabic: vowel centralization)

Vowel Reading CS

i 56.5 37.9

i: 15.0 19.7

a 42.4 49.0

a: 26.8 36.4

u 44.7 41.1

u: 24.0 33.0

Schwa variant rates (%)

In Algerian Arabic, vowel centralization is also vowel dependent

Long vowels less subject to centralization

[i:] is less often centralized than the other vowels Speech style i.e. read vs spontaneous CS does not have much impact on vowel

centralization in Algerian Arabic

Vowel centralization

In French (French natives, bilinguals):

[O]is more often centralized compared to the other target vowels conform to the findings in (Boula de Marieul et al., 2008) bilinguals centralize vowels in the same way as do French natives In Algerian Arabic (reading, code-switching):

speech style does not have an impact on vowel reduction rate [i:] is less often centralized than the other vowels

possible reason: extreme position of[i:] in the vowel triangle in order to investigate this hypothesis, further acoustic analyses are needed

Consonant Lenition in French

Algerian Arabic has a richer consonant base than French Arabic acoustic models

Allow parallel consonant variants: gemmination, emphatic, change of manner or place of articulation

What consonants are most affected by lenition?

What are the consonants affected by gemination?

What is the rate of emphatisation in both speech types ?

Some initial results

It appears that bilinguals have more variation than native French voiceless stop → voiced stop or fricative

largest variation observed for /t/ realized as [d] (10%) or [th] 15%

the emphatic forms of consonants [t,d,s] are selected about 20% by both populations

similar observation for gemminates: 20% for bilinguals, 15% for French most often occurences for consonants [b ,g ,l ,m ,s ,t, v,f]

highest (38%) for [f] from bilinguals

Diachronic change

Socio-phonetic, corpus-based linguistic study of journalistic speech in French news[Candea et al, Interspeech 2013]

Consonant cluster reduction explique, exclaim Palatalization and affrication of dental stops Fricative epithesis [¸c] after final vowels

mercredi sur ce dossier Investigate these phenomena over the last decade

Diachronic change

Epithesis duration increased by about 20-30% (+20ms 2003-2007)

Some Perspectives & Outstanding Challenges

Speech technologies

Entering in everyday lives

More languages, wider data variety Less reliance on annotated training data applications of speech technology

Language learning

Assess mental health conditions and rehabilitation aide Linguistic data exploration (Big data)

Explore and validate linguistic hypotheses

Help document and characterize languages and variants

in particular oral languages and languages with relatively small speaker populations

Little semantic and world knowledge in models

Thank You

for your attention

and to many colleagues over the years who have contributed directly or indirectly to this work, including: Gilles Adda, Martine Adda-Decker, Djegdjiga Amazouz, Maria Candea, Ioana Chitoran, Jamison

Cooper-Leavitt, Julien Despres, Thiago Fraga da Silva, Jean-Luc Gauvain, William Hartmann, Nidia Hernandez, Viet-Bac Le, Abdel Messaoudi, Oana Niculescu, Annie Rialland, Ioana Vasilescu, Bianca Vieru, Cecile Woehrling, Jane Wottawa, ....

Some links to related resources

ISCA www.isca-speech.org, in particular outreach programs ISCA-SAC www.isca-students.org

Online courses: ISCA SCOOT

https://www.isca-speech.org/iscaweb/index.php/scoot, MIT, CU, CUED, CMU, IIT ...

Superlectures www.superlectures.com

Linguistic data consortium www.ldc.upenn.edu ELRA www.elra.info

History of speech & language technology www.sarasinstitute.org Speech communication, Computer Speech & Language

Conferences/workshops: ISCA Interspeech, ITRWs, IEEE ICASSP, ASRU, HLT, SLT, SLTU, ....

Case Study: Hungarian

[Roy et al, IS13]

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80

Seeds 1 2 3 4 +MLP.e +MLP.h

%WER

Seed models from 5 languages

Untranscribed audio data: 40 hours - 300 hours Hungarian MLP trained using unsupervised transcripts

Active Learning

Gain for techniques

F <0.5%

FF 0.6−1.5%

FFF 0.6−3.0%

FFFF >3.0%

3h train 40h train

Technique Gain STT Gain KWS Gain STT Gain KWS

Subword KWS N/A FFF N/A FFFF

Data selection F none N/A N/A

SST FFF F none none

Data augmentation FFFF FFF FF FFF

Webtexts FFFF FFFF FFF FF

NNLMs - - FF F

SST: semi-supervised training

Pronunciation variants: English Switchboard

Multi-word #Total Full form #Align %Align Comments + Variants

did-not 2559 dId nAt 103 4.0 full form

+ dIdn

"t 275 10.7 n(A→@)

+ dIdn

" 1175 45.9 + final-/t/ deletion

+ dIn 1006 39.3 + coda /d/ deletion

going-to-be 750 gOIng tÚbi 73 9.7 full form

+ gOn@bi 432 57.6 complex:Ing t→n + g@bi 245 32.7 + complex: On@→@

wants-to 157 wOntstu 15 9.6 full form

+ wOnstu 78 49.7 coda C-cluster simplification + wOnts@ 7 4.5 onset /t/-deletion

+ wOns@ 57 36.3 both /t/-deletions

Testing pronunciation variants : French

Words with shortened pronunciation variants French casual speech corpus (NCCFr)

Word #Total Full form #Align %Align Comments + Variants

parce (que) 2590 pAös@ 4 0.2 full form

’because’ + pAös 45 1.7 no final schwa

+ pas 1309 50.6 + C-cluster simplification

+ ps 1232 47.6 + vowel deletion

quelques 56 kElk@ 14 25 full form

’some’ +kEk@ 28 50 + /l/-deletion

+ kE(k—g) 14 25 + schwa deletion

No documento Speech technologies as an aide for large-scale linguistic exploration (páginas 45-88)