• Nenhum resultado encontrado

4.3.1 Recording of the speech material

The speech material of the Finnish DTT consists of the digits 0 to 9

combined into triplets (e.g., 2-4-6). In the Finnish language the digits 0 to 6 have two syllables and the digits 7 to 9 have three syllables. The speech material was recorded in the House of Hearing in Oldenburg, Germany.

The speaker was a trained native Finnish female speaker who is working as a news anchor for Finland’s national public service broadcasting company YLE. She was also the speaker for the Finnish matrix sentence test (Dietz et al., 2014). For the Finnish DTT she spoke the numbers in standard Finnish dialect and with a standard pronunciation. A limited set of digit triplets

97

were formed and arranged into two lists in random order. Each digit occurred twice on each position within a triplet, and each list was recorded twice. A carrier phrase ”Numerot (The digits):…” preceded every triplet during the recording.

The recording took place in a sound insulated room. The set-up met the ISO 8253-3:2012 requirements for recording speech material for speech tests, and it was identical to the set-up used for the recording of the speech material for the Finnish matrix sentence test (described in more detail in Dietz et al. (Dietz et al., 2014)). The speaker was instructed to use natural speech rate and speaking effort, and to maintain a constant distance from the microphone during the recordings.

4.3.2 Cutting the speech material and re-synthesizing the triplets After recording, high pass filtering at 50 Hz was used to reduce low frequency noise. The sound files were cut into individual triplets, and the digit root mean square levels (RMS) of each triplet were equalized to eliminate any long-term trend in speaking effort. Next, the recorded triplets were manually cut into individual digits with a pre- and post-word flanking of 5 ms. Digits starting with a plosive (digits 2, 3, 6, 8) were cut 10- 20 ms before the beginning of the plosive. All other digits were cut as close as possible to their beginnings and ends.

For each digit at each position in the triplet two of the most natural sounding versions (version 1 and version 2) of the recordings were chosen for optimization. Test lists for the optimization measurements were

created by combining the sound files containing single digits and their ramps into digit triplets in a way that preserved the individual digit’s original position in the triplet. This method allowed the preservation of prosody when single digits were combined to form triplets. The same method was also used for the German DTT (Zokoll et al., 2012) and for the Finnish matrix sentence test (Dietz et al., 2014). Six test lists of 30 triplets were formed for both versions of the digits resulting in 12 test lists in total.

98

In each test list, each digit occurred three times in each position. A single triplet occurred only once within the six test lists for both versions 1 and 2.

For the carrier phrase only one version of the recordings was used. The RMS of the announcement phrase was set at 2 dB higher than the RMS of the triplets since especially at poor signal-to-noise ratios it is beneficial for the carrier phrase to be slightly more audible than the triplets (Zokoll et al., 2012).

4.3.3Development of the masking noise

In accordance with the current recommendations (Zokoll et al., 2012;

Akeroyd et al., 2015), the masking noise for the Finnish DTT is quasi- stationary masking noise that has a long-term spectrum corresponding to the long-term spectrum of the triplets without fluctuations in level. It was generated by superposing all individual digit files 30-fold using variable and random delays before the start of the sound files and between the sound files. The recordings of the carrier phrase were not used to create the noise.

4.3.4 Participants

Both the optimization and evaluation measurements took place at Kuopio University Hospital. Sixteen native Finnish speakers aged from 20 to 30 years (mean 23.1 years) participated in the optimization measurements, and 19 native Finnish speakers aged from 18 to 34 years (mean 23.2 years), who had not participated in the optimization measurements, participated in the evaluation measurements. All the participants had normal hearing confirmed by pure-tone audiometry at the beginning of the session (pure- tone threshold ≤15 dB HL between 0.125 and 8 kHz). All measurements were performed monaurally on the better ear. Informed consent was obtained from all participants. The study was approved by the Research Ethics Committee of the Northern Savo Hospital District.

99

4.3.5 Optimization

The standard deviation of an SRT estimate can be modeled as being inversely proportional to the slope of the intelligibility function (Brand and Kollmeier, 2002). As shown earlier, the slope of the intelligibility function depends on the slope of the individual test items and their variation in intelligibility (Kollmeier, 1990). Therefore, both these variables were optimized to obtain homogenous speech material and to increase the precision of the Finnish DTT.

The measurements were done in a sound-attenuated booth using free field equalized Sennheiser HDA200 headphones (Sennheiser Electronics GmbH & Co. KG, Wedemark-Wennebostel, Germany). The equipment and set-up were the same as for the optimization of the speech material for the Finnish matrix sentence test (Dietz et al., 2014).

Subjects were told that all triplet combinations of digits 0 to 9 were possible, and they were instructed to repeat the presented three digits in the correct order. The experimenter entered the repeated triplet into the software. If a digit could not be heard, the subjects were allowed to say so, and it was thus assumed that the guess rate was 0. The subjects received no feedback on their responses during the test.

To randomize the presentation of the test lists, the subjects were divided into two groups. Both groups initially performed two training lists at constant SNR of 0 and -2 dB. After training both groups performed eight test lists in random order at the following constant SNRs: -21.0, -18.5, -16.0, -13.5, -11.0, -8.5, -6.0, and -3.5 dB. The noise level was held constant at 65 dB SPL. Group 1 used lists from recording version 1 and group 2 used lists from recording version 2. This way each digit at each position from the selected recording version was presented to the same test subject at each SNR exactly 3 times. For each digit at each position in the triplet,

intelligibility scores were determined for every SNR measured. The

100

psychometric function for each individual digit realization was obtained by performing a maximum likelihood fit using the raw data and the equation

𝐼(𝐿) = 1 1 +𝑒 ( ) (1)

where I is the intelligibility of the digit, L is the level (given here as signal to noise ratio), s is the slope of the psychometric function, and L50 is the SRT of the digit. The fitted parameters for each digit were s and L50.

The goal of optimization was to obtain test material where individual digits have psychometric functions with steep slopes and intelligibility scores that are close to each other. Therefore, the level of each digit was adjusted to bring it as close as possible to the average L50 of all digits. The amount of level correction needed was determined by comparing each individual digit’s L50 to the average L50 of all digits. The level differences were within 2 dB in all but five digits for which the calculated correction needed was 2.1 or 2.2 dB. However, it was decided to limit the level

corrections to 2 dB to avoid unnatural sounding sound level changes within a triplet. The slopes of the psychometric functions of recording versions (version 1 and version 2) of digits were compared, and the version with the steeper slope was chosen for each digit separately.

The optimized digits were arranged into triplets to create the final test lists. Altogether six lists each comprising 30 triplets were formed. Within each test list one digit occurs at each position exactly three times. A single digit may occur twice in one triplet but never sequentially, i.e., triplet 4-6-4 is allowed but triplet 2-2-3 is not. No triplet occurs twice in the test lists.

The background noise file and the announcement phrase remained the same as for the optimization measurements.

101

4.3.6 Evaluation measurements

The goal of the evaluation measurements was to obtain a normative reference function for normal-hearing listeners, and to verify that there is no systematic difference in intelligibility between the final test lists.

For the evaluation measurements we used the same apparatus and set- up as described for the optimization measurements. In the beginning of the session each subject performed two training lists in an adaptive procedure. The measurements started at 0 dB SNR. Triplet scoring was used, i.e., all digits had to be identified correctly in the correct order. Based on the given answer the speech level of the next triplet was adjusted with an automatic adaptive up-and-down procedure with a step size of 2 dB.

The noise level was held constant at 65 dB SPL. The lists used were selected randomly from the six test lists. This same test protocol will be used for the final test application.

Subsequently, each subject was tested with all the six test lists at three different constant SNRs (18 measurements total per subject). The SNRs used were -14.0 dB SNR, -12.5 dB SNR and -11.0 dB SNR, since they were expected to yield intelligibilities of approximately 20%, 50%, and 80%, respectively. The order of the test lists and SNRs used was randomized to minimize training or fatigue effect on the results.