Parameterisation - 2 Design and Recording of a Corpus of Portuguese Fricatives 6

The parameters used were dened rst from mechanical model results (Shadle 1985) and further developed as a potential tool for classifying fricatives using real speech (Shadle and Mair 1996). They consist of measures of the dynamic range of the spectrum, and spectral slope, and are applied to the spectrum of the far-eld acoustic signal.

The far-eld acoustic signal is the result of the excitation of the tract transfer function by the source (for unvoiced) or sources (for voiced fricatives). The transfer function consists of poles, which are the resonances of the entire vocal tract, and zeros, which are antiresonances of the part of the tract upstream of the noise source, see Figure 6.1.

If the noise source is distributed, as in /c/, zero frequencies will be corre- spondingly smeared. An intermediate source location (as for all fricatives) always produces a low-frequency zero. Poles and zeros corresponding to

102

Amplitude

Frequency

- antiresonances of the part of the tract upstream of the noise source.

- resonances of the entire vocal tract.

Poles

Zeros

Figure 6.1: Transfer function.

back-cavity resonances tend nearly to cancel. Uncancelled poles correspond to front cavity resonances; their spectral prominence will depend on both the losses (especially radiation losses) and the noise source strength at their respective frequencies.

The noise source spectrum depends on the shape of the constriction, the tract downstream of it, and the ow velocity through it. The noise source spectrum envelope has its highest amplitude at low frequencies and falls o smoothly.

If the tract geometry remains the same and ow velocity is increased, as seen in Figure 6.2, the noise spectral envelope increases in amplitude at all frequencies, but more so at higher frequencies (Shadle and Mair 1996). The noise source is weaker in voiced fricatives than their unvoiced counterparts.

Flow Velocity

Frequency (kHz)

Amplitude (dB)

20 00

Figure 6.2: Noise source spectral envelope.

The frication source location can be constant as in most examples of /s/ and 103

/^A/, or it can be changing over time as in any fricative which is surrounded by dierent vowels. We will be looking at time-averaged power spectra, so the change in source location will also be \time-averaged", i.e., the change in source location is assimilated into the time-averaged spectra.

If our goal is identication of the fricative spoken regardless of its context or the way in which it was spoken, we are then interested in the transfer function, since the peak frequencies oer clues to the place, and in the source type, since that not only dierentiates voiced and unvoiced versions, but, in indicating whether the source is localized or distributed, again oers clues to the place. If our goal instead is to describe the acoustic variation caused by the context or the way in which a particular fricative is spoken, we are then interested in the source spectrum, since it oers clues to the source variations across subject and corpus. In this study we are primarily interested in the latter goal.

Figure 6.3 illustrates the four parameters that we consider in this paper. ^F is the frequency of the spectral peak between 2 and 8kHz having maximum amplitude, and which corresponds to the same cavity resonance for all tokens of a particular fricatives.

The dynamic amplitude, ^A^d, is the dierence between the maximum amplitude value of the averaged power spectrum occurring between 0.5 and 20kHz, and the minimum amplitude between 0 and 2kHz. Two linear regression lines are t to the spectrum; ^S^p⁰ is the slope of the line t to all the points from 500Hz to ^F, and ^S^p is the slope of the line t to all the points from ^F to 20kHz. This frequency range allows us to capture relevant variations in the sound power (the area delimited by the acoustic spectrum) and shifts in the peaks, which was not possible in previous studies such as the measurements of spectral tilts up to 5kHz by Badin et al. (1994).

The values of ^F used to calculate^S^p and ^S^p⁰ were the same for each place of articulation of the 6 fricatives for all speakers and corpora: ^F/f,v/ = 5kHz,

F/s,z/ = 6kHz and^F/^A,^O/ = 4kHz. These frequencies are the average of the manually calculated values of all sustained tokens (Corpus 1a) of a particular place (labiodental, alveolar and postalveolar).

Given these denitions, we can make the following predictions. The parameter ^F should be related to place of the fricative, decreasing as place moves posteriorly. The parameter ^A^d should be maximized for a localized source,

104

0 2 4 6 8 10 12 14 16 18 20

−60

−50

−40

−30

−20

−10 0 10 20

S_p A_d

Amplitude (dB)

F Frequency (kHz) S’p

Figure 6.3: Dynamic amplitude ^A^d, and regression lines used to calculate low frequency (500Hz to ^F kHz) slope ^S^p⁰ (dashed line) and high frequency (^F kHz to 20kHz) slope ^S^p (solid line). Sustained fricative /^A/ (Corpus 1a) produced by Speaker ISSS.

and for higher source strength, as in sibilants, and unvoiced fricatives. The parameter ^S^p should be related to the source strength. Although the resonance peaks will aect the line t, they should aect the t in the same way for within-fricative comparisons. Thus, for a given fricative where transfer function will vary only slightly from token to token, ^S^p should increase, i.e. become less negative, as ow velocity through the constriction increases.

105

Soft Medium Loud

Amplitude (dB)

Frequency (kHz) 0

0 20

Flow Velocity

Figure 6.4: Correlation between eort level and increased ow velocity.

Eort level (see Figure 6.4) and syllable stress should be correlated with increased ow velocity; the velocity should also be at a maximum mid-fricative, when constriction area is smallest and pressure across the constriction highest. The parameter ^S^p⁰ should be similar to ^A^d. For a fricative with a localized source and posterior place, ^S^p⁰ will be the largest. Within a fricative, increased ^S^p⁰ should be correlated with either more posterior place (due, for instance, to a more rounded vowel context) or greater source strength. See Table 6.1 for a summary of the predicted eects on parameters.

106

Table 6.1: Predicted eects on parameters.

Phonetic Class Aeroacoustics Predictions

Posterior place; sibilants

/s,z,^A,^O/ Longer front cavity; Lo- calized source; Higher source strength

F lower; ^A^d, ^S^p and ^S^p⁰ higher

Forward place; nonsibi-

lants /f,v/ Distributed source;

Lower source strength ^F higher; ^A^d, ^S^p and ^S^p⁰ lower

Unvoiced Higher source strength ^A^d, ^S^p and ^S^p⁰ higher Voiced-Devoiced

Voiced Lower source strength ^A^d, ^S^p and ^S^p⁰ lower Loud eort level Higher source strength ^A^d, ^S^p and ^S^p⁰ higher Medium eort level

Soft eort level Beginning of fricative

Middle of fricative Higher source strength ^A^d, ^S^p and ^S^p⁰ higher End of fricative

Stressed syllable Higher source strength ^A^d, ^S^p and ^S^p⁰ higher Destressed syllable

Word position

Rounded Longer front cavity;

Lower source strength ^y ^F lower; ^A^d higher; ? ^S^p⁰ and ^S^p lower

Unrounded

Subject ? No eect

A higher source strength means higher airow for the same constriction area

c, or a constant ow for a smaller^A^c.

yThe lips form a second constriction and so the one downstream in the vocal tract has lower strength (unpublished experimental results by Shadle and Bandin).

No documento 2 Design and Recording of a Corpus of Portuguese Fricatives 6 (páginas 113-118)