L-dropping rates for dialogs do not seem to depend on right context In prepared speech, there is a tendency for L-retention, but higher deletion rates seen in right #C context
Supports hypothesis of relevance of the phonetic context on realization of final -l
Breaking the Unwritten Language Barriers
BULB: speech technologies to help field linguists document unwritten languages
Primarily speech recognition and machine translation 3 Bantu languages: Basaa, Myene, Embosi
ANR-DFG funded French-German project Recording tool: Aikuma
Collaboration of speech technologists and linguists
Training workshops (technology for linguists, lingustics for technologists)
Breaking the Unwritten Language Barriers
BULB: speech technologies to help field linguists document unwritten languages
Primarily speech recognition and machine translation 3 Bantu languages: Basaa, Myene, Embosi
ANR-DFG funded French-German project Recording tool: Aikuma
Collaboration of speech technologists and linguists
Training workshops (technology for linguists, lingustics for technologists)
BULB: Lig Aikuma
Smartphone application for data collection (after Bird et al, ACL 2014) https://lig-aikuma.imag.fr
Collects speaker meta data
Supports recording, respeaking (rs) and translation (tr)
(from L. Besacier, CMLD 2018)
BULB: data
Smartphone application for data collection (after Bird et al, ACL 2014) https://lig-aikuma.imag.fr
Collects speaker meta data
Supports recording, respeaking (rs) and translation (tr) Language #hours (rs,tr)
Bassa 31 (23, 33) Emboshi 55 (30, 30) Myene 45 (44, 11)
(from L. Besacier, CMLD 2018)
BULB: data
Smartphone application for data collection (after Bird et al, ACL 2014) https://lig-aikuma.imag.fr
Collects speaker meta data
Supports recording, respeaking (rs) and translation (tr) Language #hours (rs,tr)
Bassa 31 (23, 33) Emboshi 55 (30, 30) Myene 45 (44, 11)
(from L. Besacier, CMLD 2018)
Embosi: vowel/morpheme elision
Investigate vowel elision and morpheme deletion (Cooper-Leavitt et al, Interspeech 2017)
Variant rules for elision after Rialland et al., 2012
Vowel elision at word boundaries: long/short or short/short vowel contact
CV[long,HL]#V[short].CV →CV[short,H]#V.CV
CV[H,low]#V[L,mid].CV→ CV[H,mid]#CV
Compound words introduced to model morpheme deletion
Example: Embosi ya-deletion
. w á m i t ú β á b ɔ s i l a ⁿw á s o k o ⁿdz i .
[silence] wa ámitúβá bɔsi la mwásí ya_okondzi [silence]
Time (s)
0 3.108
Frequency (Hz)
0 5000
Vowel/morpheme elision
Morpheme ndel ndel+ve n¬del Total N
ya 83(35%) 125(52%) 31(13%) 239
mo 0 0 8(100%) 8
ba 0 0 7(100%) 7
ng´a 12(3%) 6(1%) 439(86%) 457
nO 13(8%) 9(6%) 133(86%) 155
wa 17(4%) 14(3%) 431(93%) 462
Pronunciation dictionary permitted vowel or morpheme deletion Vowel elision can be attributed to phonetic or phonological processes ya has highest deletions/vowel elision
ya is most often preserved at the start or end of an utterance (syntactic distribution)
Case Study: FACST
Investigate speech of bilingual French/Arabic speakers
French Algerian Code-switching (CS) Triggered (FACST) corpus (Amazouz et al., LREC 2018)
Bilingual speakers selected based on reponses in online questionnaire about bilingualism, education and CS practice
Conversations with CS triggered by questions in both languages 20 speakers: 10 male/female, aged: 20-40 years
7h30m speech, 15-40 m/speaker, elicited spontaneous & read speech Study and characterize CS
Study phonetic realization of speech by bilinguals French richer vowel inventory than Arabic (IS2018) Arabic has a richer consonant inventory than French
Example of code-switching types
FACST transcription example
Manual segmentation based on language change, breath groups and speaker turns
CS segments less than 30 ms not segmented
Play Stop
Translation:“He was born in 1988. He started working as a pastry cook with my uncle. He was paid 8000 dinars and he used to give this money to my mom; she had no money because she was not working. My uncle helped us much and it is due to their
Aligned transcripts CS
Vowel production by FR-AA bilinguals
Bilingual speakers’ speech presents more acoustic variation than that of monolinguals (Bullock, 2012; Auer, 2010)
Bilinguals access more than one phonemic inventory which may lead to potential interferences (Fricke 2016; Grosjean, 1995)
Vowel inventories of different sizes (French is richer)
To what extent do bilinguals adapt their vowel productions to the linguistic context?
Methodology: use automatic speech alignment to study vowel variants Focus on parallel variants (3 expts allowing vowel substitutions) Frequent replacements of the target vowel by competing vowels are considered an indicator of variation
Vowels in French and Arabic
[Delattre, 1966] [Thelwall & Akram Sa’adeddin, 1999]
Standard French: 11 oral vowels, 4 nasal vowels, 1 schwa Classic Arabic: 3 oral vowels
How does this difference influence speech production in bilingual speakers?
Exp. 1: Vowel variants in French
Populations: French natives vs. bilinguals (French-Algerian Arabic) Language: French
NCCFr [Torreira et al, Speech Communication 2010]
36h of conversational French, 46 speakers (24 female) French acoustic model
Two production variants for each target vowel Vowel Variants Examples
i [e, y] lit (bed): li, le, ly e [E, œ] nez (nose): ne, nEnœ
a [E, œ] chat (cat): Sa,SE,Sœ (anterior)
a [O, œ] Sa,SO,Sœ (posterior)
o [O, ø] chaud (hot): So,SO,Sø u [o, ø] loup (wolf): lu,lo,lø
Exp. 1 - Results
(French: natives vs. bilinguals)Observed variation is vowel independent
Comparable amount of variation in both groups(French natives, bilinguals)
One exception: for [a] with anterior variants, bilinguals show considerably less variation than French natives
Exp. 2: Vowel variants in code-switching
Population: bilinguals (French-Algerian Arabic) Languages: French, Algerian Arabic
Production variation for three target vowels, each with two variants Vowel production variation in bilinguals as a function of language French acoustic model
Are the realizations of Arabic vowels acoustically close to French vowels?
and if so, which?
Vowel Variants i [e, y]
a [E, œ](anterior)
a [O, œ](posterior)
u [o, ø]
Exp. 2: Results
(bilinguals: code-switching)The observed variation is vowel dependent
[i]is substituted more often than the other vowels ([a], [u]) [a](post)is least often
substituted
Language also has an impact on vowel variation
in French, the target vowel is more often produced than in Algerian Arabic
this pattern is observed for all target vowels
Exp. 2: Results
(bilinguals: code-switching)The observed variation is vowel dependent
[i]is substituted more often than the other vowels ([a], [u])
[a](post)is least often substituted
Language also has an impact on vowel variation
in French, the target vowel is more often produced than in Algerian Arabic
this pattern is observed for all target vowels
Exp. 2: Results
(bilinguals: code-switching)The observed variation is vowel dependent
[i]is substituted more often than the other vowels ([a], [u]) [a](post)is least often
substituted
Language also has an impact on vowel variation
in French, the target vowel is more often produced than in Algerian Arabic
this pattern is observed for all target vowels
Exp. 2: Results
(bilinguals: code-switching)The observed variation is vowel dependent
[i]is substituted more often than the other vowels ([a], [u]) [a](post)is least often
substituted
Language also has an impact on vowel variation
in French, the target vowel is more often produced than in Algerian Arabic
this pattern is observed for all target vowels
Exp. 3: Vowel centralization
Quantify movement of peripheral vowels towards the center of the vowel triangle
One production variant for each target vowel: schwa [@]
French from natives vs. bilinguals
Exp. 3: Results
(French: vowel centralization)Vowel FR FR-Alg
i 14.1 12.8
e 20.9 24.4
E 34.1 15.9
a 34.0 15.9
O 39.4 20.2
o 33.5 21.6
u 25.0 16.2
˜
E 13.6 7.7
˜
a 17.5 8.7
˜O 17.7 6.5 Schwa variant rates (%)
In French, vowel centralization is vowel dependent
[O]is most affected by vowel centralization
[˜E]is least affected by centralization
Exp. 3: Results
(French: vowel centralization)Vowel FR FR-Alg
i 14.1 12.8
e 20.9 24.4
E 34.1 15.9
a 34.0 15.9
O 39.4 20.2
o 33.5 21.6
u 25.0 16.2
˜
E 13.6 7.7
˜
a 17.5 8.7
˜
O 17.7 6.5
Schwa variant rates (%)
In French, vowel centralization is vowel dependent
[O]is most affected by vowel centralization
[˜E]is least affected by centralization
Exp. 3: Results
(French: vowel centralization)Vowel FR FR-Alg
i 14.1 12.8
e 20.9 24.4
E 34.1 15.9
a 34.0 15.9
O 39.4 20.2
o 33.5 21.6
u 25.0 16.2
˜
E 13.6 7.7
˜
a 17.5 8.7
˜
O 17.7 6.5
Schwa variant rates (%)
In French, vowel centralization is vowel dependent
[O]is most affected by vowel centralization
[˜E]is least affected by centralization
Exp. 3: Results
(Arabic: vowel centralization)Vowel Reading CS
i 56.5 37.9
i: 15.0 19.7
a 42.4 49.0
a: 26.8 36.4
u 44.7 41.1
u: 24.0 33.0
Schwa variant rates (%)
In Algerian Arabic, vowel centralization is also vowel dependent
Long vowels less subject to centralization
[i:] is less often centralized than the other vowels Speech style i.e. read vs spontaneous CS does not have much impact on vowel
centralization in Algerian Arabic
Exp. 3: Results
(Arabic: vowel centralization)Vowel Reading CS
i 56.5 37.9
i: 15.0 19.7
a 42.4 49.0
a: 26.8 36.4
u 44.7 41.1
u: 24.0 33.0
Schwa variant rates (%)
In Algerian Arabic, vowel centralization is also vowel dependent
Long vowels less subject to centralization
[i:] is less often centralized than the other vowels
Speech style i.e. read vs spontaneous CS does not have much impact on vowel
centralization in Algerian Arabic
Exp. 3: Results
(Arabic: vowel centralization)Vowel Reading CS
i 56.5 37.9
i: 15.0 19.7
a 42.4 49.0
a: 26.8 36.4
u 44.7 41.1
u: 24.0 33.0
Schwa variant rates (%)
In Algerian Arabic, vowel centralization is also vowel dependent
Long vowels less subject to centralization
[i:] is less often centralized than the other vowels Speech style i.e. read vs spontaneous CS does not have much impact on vowel
centralization in Algerian Arabic
Vowel centralization
In French (French natives, bilinguals):
[O]is more often centralized compared to the other target vowels conform to the findings in (Boula de Marieul et al., 2008) bilinguals centralize vowels in the same way as do French natives In Algerian Arabic (reading, code-switching):
speech style does not have an impact on vowel reduction rate [i:] is less often centralized than the other vowels
possible reason: extreme position of[i:] in the vowel triangle in order to investigate this hypothesis, further acoustic analyses are needed
Consonant Lenition in French
Algerian Arabic has a richer consonant base than French Arabic acoustic models
Allow parallel consonant variants: gemmination, emphatic, change of manner or place of articulation
What consonants are most affected by lenition?
What are the consonants affected by gemination?
What is the rate of emphatisation in both speech types ?
Some initial results
It appears that bilinguals have more variation than native French voiceless stop → voiced stop or fricative
largest variation observed for /t/ realized as [d] (10%) or [th] 15%
the emphatic forms of consonants [t,d,s] are selected about 20% by both populations
similar observation for gemminates: 20% for bilinguals, 15% for French most often occurences for consonants [b ,g ,l ,m ,s ,t, v,f]
highest (38%) for [f] from bilinguals
Diachronic change
Socio-phonetic, corpus-based linguistic study of journalistic speech in French news[Candea et al, Interspeech 2013]
Consonant cluster reduction explique, exclaim Palatalization and affrication of dental stops Fricative epithesis [¸c] after final vowels
mercredi sur ce dossier Investigate these phenomena over the last decade
Diachronic change
Epithesis duration increased by about 20-30% (+20ms 2003-2007)
Some Perspectives & Outstanding Challenges
Speech technologies
Entering in everyday lives
More languages, wider data variety Less reliance on annotated training data applications of speech technology
Language learning
Assess mental health conditions and rehabilitation aide Linguistic data exploration (Big data)
Explore and validate linguistic hypotheses
Help document and characterize languages and variants
in particular oral languages and languages with relatively small speaker populations
Little semantic and world knowledge in models
Thank You
for your attention
and to many colleagues over the years who have contributed directly or indirectly to this work, including: Gilles Adda, Martine Adda-Decker, Djegdjiga Amazouz, Maria Candea, Ioana Chitoran, Jamison
Cooper-Leavitt, Julien Despres, Thiago Fraga da Silva, Jean-Luc Gauvain, William Hartmann, Nidia Hernandez, Viet-Bac Le, Abdel Messaoudi, Oana Niculescu, Annie Rialland, Ioana Vasilescu, Bianca Vieru, Cecile Woehrling, Jane Wottawa, ....
Some links to related resources
ISCA www.isca-speech.org, in particular outreach programs ISCA-SAC www.isca-students.org
Online courses: ISCA SCOOT
https://www.isca-speech.org/iscaweb/index.php/scoot, MIT, CU, CUED, CMU, IIT ...
Superlectures www.superlectures.com
Linguistic data consortium www.ldc.upenn.edu ELRA www.elra.info
History of speech & language technology www.sarasinstitute.org Speech communication, Computer Speech & Language
Conferences/workshops: ISCA Interspeech, ITRWs, IEEE ICASSP, ASRU, HLT, SLT, SLTU, ....
Case Study: Hungarian
[Roy et al, IS13]0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80
Seeds 1 2 3 4 +MLP.e +MLP.h
%WER
Seed models from 5 languages
Untranscribed audio data: 40 hours - 300 hours Hungarian MLP trained using unsupervised transcripts
Active Learning
Gain for techniques
F <0.5%
FF 0.6−1.5%
FFF 0.6−3.0%
FFFF >3.0%
3h train 40h train
Technique Gain STT Gain KWS Gain STT Gain KWS
Subword KWS N/A FFF N/A FFFF
Data selection F none N/A N/A
SST FFF F none none
Data augmentation FFFF FFF FF FFF
Webtexts FFFF FFFF FFF FF
NNLMs - - FF F
SST: semi-supervised training
Pronunciation variants: English Switchboard
Multi-word #Total Full form #Align %Align Comments + Variants
did-not 2559 dId nAt 103 4.0 full form
+ dIdn
"t 275 10.7 n(A→@)
+ dIdn
" 1175 45.9 + final-/t/ deletion
+ dIn 1006 39.3 + coda /d/ deletion
going-to-be 750 gOIng tÚbi 73 9.7 full form
+ gOn@bi 432 57.6 complex:Ing t→n + g@bi 245 32.7 + complex: On@→@
wants-to 157 wOntstu 15 9.6 full form
+ wOnstu 78 49.7 coda C-cluster simplification + wOnts@ 7 4.5 onset /t/-deletion
+ wOns@ 57 36.3 both /t/-deletions
Testing pronunciation variants : French
Words with shortened pronunciation variants French casual speech corpus (NCCFr)
Word #Total Full form #Align %Align Comments + Variants
parce (que) 2590 pAös@ 4 0.2 full form
’because’ + pAös 45 1.7 no final schwa
+ pas 1309 50.6 + C-cluster simplification
+ ps 1232 47.6 + vowel deletion
quelques 56 kElk@ 14 25 full form
’some’ +kEk@ 28 50 + /l/-deletion
+ kE(k—g) 14 25 + schwa deletion