Original Article Artigo Original
ISSN 2317-1782 (Online version)
This is an Open Access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Accuracy of traditional and formant acoustic
measurements in the evaluation of
vocal quality
Acurácia das medidas acústicas tradicionais e
formânticas na avaliação da qualidade vocal
Leonardo Wanderley Lopes1Jônatas do Nascimento Alves2
Deyverson da Silva Evangelista2
Fernanda Pereira França2
Vinícius Jefferson Dias Vieira2
Maria Fabiana Bonfim de Lima-Silva1
Leandro de Araújo Pernambuco1
Keywords
Voice Accuracy Acoustic Analysis Vocal Quality Voice Disorders
Descritores
Voz Acurácia Análise Acústica Qualidade Vocal Distúrbios da Voz
Correspondence address:
Leonardo Wanderley Lopes Departamento de Fonoaudiologia, Centro de Ciências da Saúde, Universidade Federal da Paraíba – UFPB
Cidade Universitária, Campus I, Bairro Castelo Branco, João Pessoa (PB), Brasil, CEP: 58051-900. E-mail: lwlopes@hotmail.com
Received: January 08, 2018
Accepted: April 09, 2018
Study conducted at Departamento de Fonoaudiologia, Universidade Federal da Paraíba – UFPB - João Pessoa (PB), Brasil.
1 Departamento de Fonoaudiologia, Universidade Federal da Paraíba – UFPB - João Pessoa (PB), Brasil. 2 Programa de Pós-graduação em Linguística, Universidade Federal da Paraíba – UFPB - João Pessoa (PB), Brasil. Financial support: National Council for Scientific and Technological Development (Conselho Nacional de Desenvolvimento Científico e Tecnológico - CNPq). Process nº 480168/2013-0.
Conflict of interests: nothing to declare. ABSTRACT
Purpose: Investigate the accuracy of isolated and combined acoustic measurements in the discrimination of
voice deviation intensity (GD) and predominant voice quality (PVQ) in patients with dysphonia. Methods: A
total of 302 female patients with voice complaints participated in the study. The sustained /ɛ/ vowel was used to extract the following acoustic measures: mean and standard deviation (SD) of fundamental frequency (F0), jitter, shimmer, glottal to noise excitation (GNE) ratio and the mean of the first three formants (F1, F2, and F3).
Auditory-perceptual evaluation of GD and PVQ was conducted by three speech-language pathologists who were voice specialists. Results: In isolation, only GNE provided satisfactory performance when discriminating between
GD and PVQ. Improvement in the classification of GD and PVQ was observed when the acoustic measures
were combined. Mean F0, F2, and GNE (healthy × mild-to-moderate deviation), the SDs of F0, F1, and F3
(mild-to-moderate × moderate deviation), and mean jitter and GNE (moderate × intense deviation) were the best combinations for discriminating GD. The best combinations for discriminating PVQ were mean F0, shimmer,
and GNE (healthy × rough), F3 and GNE (healthy × breathy), mean F0, F3, and GNE (rough × tense), and mean
F0, F1, and GNE (breathy × tense). Conclusion: In isolation, GNE proved to be the only acoustic parameter capable of discriminating between GG and PVQ. There was a gain in classification performance for discrimination
of both GD and PVQ when traditional and formant acoustic measurements were combined.
RESUMO
Objetivo: Investigar a acurácia das medidas acústicas, isoladas e combinadas, na discriminação da intensidade
do desvio vocal (GG) e da qualidade vocal predominante (QVP) em pacientes disfônicos. Método: Participaram
302 pacientes do gênero feminino, com queixa vocal. A partir da vogal /ɛ/ sustentada, foram extraídas as medidas acústicas de média e desvio padrão (DP) da frequência fundamental (F0), o jitter, o shimmer e o Glottal to noise
excitation (GNE) e a média dos três primeiros formantes (F1, F2, F3). A avaliação perceptivo-auditiva do GG e QVP foi realizada por três fonoaudiólogos especialistas em voz. Resultados: Isoladamente, apenas o GNE obteve
desempenho satisfatório na discriminação do GG e da QVP. Houve uma melhora na classificação do GG e QVP
com a combinação das medidas acústicas. A média de F0, F2 e GNE (saudável × desvio leve a moderado), DP de F0, F1 e F3 (leve a moderado × desvio moderado), Jitter e GNE (moderado × desvio intenso) foram as melhores
combinações para discriminar o GG. As melhores combinações para discriminação da QVP foram média de F0,
Shimmer e GNE (saudável × rugosa), F3 e GNE (saudável × soprosa), média de F0, F3 e GNE (rugosa × tensa), média de F0, F1 e GNE (soprosa × tensa). Conclusão: De forma isolada, o GNE mostrou-se o único parâmetro acústico capaz de discriminar o GG e a QVP. Houve um ganho no desempenho da classificação com a combinação
INTRODUCTION
The voice is essentially a multidimensional phenomenon that includes physiological, perceptual, aerodynamic, acoustic and emotional aspects. Therefore, it is necessary that voice evaluations also follow this principle and that these dimensions are considered and integrated in the process to achieve an overall view of dysphonia(1).
The goal of voice evaluation is to analyze voice quality, identify
whether the voice is healthy or not, diagnose the presence of a perturbation, determine a prognosis, and monitor the patient’s progress during voice therapy(2). The process of voice evaluation generally includes procedures relating to a visual laryngeal examination, auditory-perceptual voice evaluation, acoustic analysis, aerodynamic evaluation and voice self-evaluation(1).
Auditory-perceptual analysis is considered the primary reference standard used by the speech therapist when performing voice evaluations(2). It is considered a subjective method, as it depends on the evaluator’s judgment and has an exclusively impressionistic nature(2,3). This type of evaluation provides information about the characterization of voice deviation
intensity, as well as the predominant voice quality(4).
Acoustic analysis is a more objective procedure. It is noninvasive and is becoming increasingly used in the voice clinic. In traditional acoustic analysis, two types of measure are used, perturbation measures (jitter and shimmer) and noise measures.
Jitter indicates the variability of the fundamental frequency in
the short term and is measured between neighboring glottal cycles. Shimmer corresponds to variability in the sound wave amplitude over the short term. Glottal-to-noise excitation (GNE) measures the additional noise in the sound signal, irrespective of the noise modulated by the glottal mechanism, indicating the source of the voice signal and whether it comes from vocal fold
vibration or from turbulent airflow generated in the vocal tract.
Measures of the perturbation and noise are therefore focused on the glottal source(3-5).
In addition to these measures, some measures are related to the resonance of the sound wave in the vocal tract, which
changes according to the different configurations of the vocal
tract structure positioning and volume of the resonance cavities during voice production. Such measures are called formants and correspond to energy concentrations along the vocal tract(3-6).
The vocal tract has a three-dimensional configuration and the sound that is produced in the glottis is modified by the positioning
of structures such as the larynx, soft palate, tongue, lips and
jaw. The frequencies of the glottal signal that are reinforced
by the supraglottic vocal tract are called formants, and their analysis provides information about adjustments being made in the supraglottic vocal tract(6-10).
Adjustments in the positioning of the articulators and in the volume of the resonance cavities determine the values of formants(6-8,11). Thus, an increase in the first formant (F1), for example, is related to a downward jaw adjustment, anterior lowering of the tongue and pharyngeal narrowing. An anterior adjustment of the tongue which is then lowered generates an increase in the second formant (F2). The formation of a smaller
cavity immediately behind the incisors can raise the value of the third formant (F3)(6-8,10,11).
In this context, there is a strong interaction between the
source producing the sound (glottis) and the filter. The feedback
from pressure encountered by the sound wave in the vocal tract
modifies the glottal airflow and vocal fold vibration mode(12). Some studies(8-10,13-15) have observed that patients with a voice disorder make adjustments not just in the glottis but also in the supraglottis. These patients have lower formant values (F1, F2, F3) than individuals without a voice disorder(10,13,15).
Thus, these adjustments may be related to the development or maintenance of, or may cooccur with, voice disorders(11,13). Such adjustments are not necessarily evaluated by traditional acoustic measures, as they focus on the glottal source(16).
Notably, acoustic analysis does not replace auditory-perceptual analysis but rather integrates the auditory and physiological levels(6-8). A combination of acoustic and perceptual auditory measures increases the accuracy in determining the presence or absence of a voice disorder and the intensity of the deviation present(17,18).
For this reason, it is important to investigate whether a combination of measures relating to the source (perturbation
and noise) and filter (formantic measures) allows a better classification of voice signals in regard to deviation intensity and predominant voice quality.
This study therefore aims to investigate the accuracy of both isolated and combined traditional acoustic and formantic measures in the discrimination of the voice deviation intensity
and predominant voice quality in dysphonic patients. To carry
out this study, we start from the hypothesis that a combination of traditional acoustic and formantic measures will improve the discrimination of voice deviation intensity and that a combination of traditional acoustic and formantic measures can improve the
discrimination between different predominant voice qualities.
METHODS
Study design
This was a descriptive, cross-sectional, observational study, evaluated and approved by the Ethics Committee of the Health Sciences Center, Federal University of Paraíba (UFPB), under
protocol number 52492/12. All participants signed a free and
informed consent form authorizing the study.
Sample
Patients treated at the Department of Speech Therapy’s Voice Laboratory (UFPB) in the period between April 2012 and July 2015 participated in this study. The following eligibility criteria were considered for participation:
• Being female, given the relationship between this variable and the mean F0 measure, which is associated with the anatomical
characteristics of the vocal folds, which are unequal between
• Being over 18 and below 65 years of age, thus avoiding the
periods of voice change and presbyphonia, respectively; • Presenting a voice complaint, answering positively to the
following question: “Do you consider that you have a voice
problem now or have had one during the past six months?”; • Having undergone a laryngeal visual examination and having
an otorhinolaryngological report.
Of the total of 530 patients evaluated in the laboratory,
96 were male, 75 were under 18 or over 65 years of age and
57 individuals had no voice complaints. Thus, 228 individuals were excluded because they did not meet the eligibility criteria,
leaving a final sample of 302 patients with a mean age of 39.25(±12.63) years. No patient had neurological or cognitive
impairments that prevented voice recording.
All sample patients presented a laryngeal report at the time of data collection, as described below: 78 (25.85%) patients had
vocal nodules, 63 (20.86%) had no structural or functional changes in the larynx, 41 (13.57%) had vocal cysts, 35 (11.60%) had hyperemia secondary to laryngopharyngeal reflux, 24 (7.94%) had a middle-posterior triangular gap, 24 (7.94%) had vocal fold polyps, 18 (5.96%) had unilateral vocal fold paralysis, 11 (3.64%) had a vocal sulcus and 8 (2.64) had Reinke’s edema.
Procedures
All data collection for this study was conducted in the Department of Speech Therapy’s Voice Laboratory (UFPB) during the initial voice evaluation session. During this session, the patients were evaluated by means of a form containing
questions relating to personal information and voice complaints. They completed voice self-evaluation questionnaires and
underwent the recording of speech tasks.
Only the personal identification, voice complaint and sustained
vowel sample data were used for this study, as described later. The voices were collected in a recording booth with
soundproofing and a noise level below 50 dB SPL, with a 44000-Hz sampling rate at 16 bits per sample and a 10-cm
distance between the microphone and the patient’s mouth.
Fonoview software, version 4.5, CTS Informática was used on
a Dell all-in-one desktop, with a Senheiser E-835 unidirectional cardioid microphone located on a pedestal and coupled to a
U-Phoria UMC 204 Behringer preamplifier.
For the voice recording collection, the patient remained standing facing the pedestal at the recommended distance between the mouth and microphone. The patient received instructions about the voice collection, and recording began soon after. During the
recording, the patient was asked to emit the sustained /Ɛ/ vowel at a frequency and intensity self-reported as comfortable and normal. The /Ɛ/ vowel was selected for this study because it is
an oral, open vowel, is not round and is considered to be the vowel with the most average position in Brazilian Portuguese, which facilitates a more neutral and intermediate position of the vocal tract. In addition, it is the most commonly used vowel
for evaluating voice quality in Brazil(20).
Subsequently, the voices were edited using SoundForge software, version 10.0. The first and final two seconds of the
vowel emission were removed due to the greater irregularity in these sections, with a minimum time of three seconds being retained for each emission. The signals were normalized for the
auditory-perceptual evaluation, using SoundForge’s “normalize”
control in peak level mode, to standardize the audio output at
between -6 and 6 dB.
The acoustic measures of the fundamental frequency (mean
and standard deviation), jitter, shimmer and glottal-to-noise excitation (GNE) were extracted manually using the voice
quality analysis module of VoxMetria software, version 4.7h
(CTS Informática, Pato Branco, Paraná, Brazil). The reference values in that software for the jitter, shimmer and GNE parameters
are 0.6, 6.5 and 0.5%, respectively. Values greater than those cited
for the jitter and shimmer are considered deviated, while values lower than that cited for the GNE may be considered deviated.
Praat software, version 5.3.77h, was used to extract the formantic measures from the vowel’s representation in a broadband
spectrogram containing the first three formants (F1, F2, and F3).
Due to the large number of estimations involved, a script was used (a tool that automatically extracts, in a standardized manner, the parametric measures investigated), which facilitated the optimization of processing time and avoided possible handling errors during the estimation procedures. The means and standard
deviations of the formant frequencies were extracted for each
sample. All values were then checked, and no outliers were
identified.
The auditory-perceptual evaluation was performed independently by three speech therapists who were voice specialists with over 10 years of experience in this type of analysis. A visual analogue scale (VAS) ranging from 0 to 100 mm was used(21) to evaluate the voice deviation intensity (general grade [GG]), of the sustained vowel. A score closer to 0 represents a lower voice deviation, and one closer to 100 a greater voice deviation.
Before the auditory-perceptual evaluation, eight
sustained /Ɛ/ vowel anchor stimuli were used for the training
of the judges. These contained two samples of individuals
with normal voice quality variability (NVQV), two samples
of individuals with mild to moderate voice deviations, two samples of individuals with moderate voice deviations and two samples of individuals with intense voice deviations. All the
files presented contained female voices. The judges were asked
to listen to the anchor stimuli immediately prior to analyzing the voices for this study. All samples selected for this training were previously analyzed by speech therapists with experience in voice analysis and were routinely used for perceptual auditory training and as anchor stimuli in the laboratory where this study was conducted.
The perceptual evaluation session took place in a silent environment. First, each judge was told that the voices should be considered as having NVQV when they were socially
acceptable, produced naturally, and without effort, noise or
unstable conditions during emission. They were also instructed that roughness would correspond to the presence of vibratory irregularities, breathiness would be related to the audible escape of air during the emission and tension would correspond to the
The auditory-perceptual parameters of roughness, breathiness and tension were chosen to characterize the signals in this study
because they are universally used to characterize voice quality
deviation(2) and because they have known correlates on the physiological and acoustic planes.
For the evaluation, each sustained vowel emission was presented three times through a speaker at a comfortable intensity
as self-reported by the evaluators. The judges then identified
the presence or absence of voice deviation, the predominant
voice quality in the deviated voices (rough, breathy or tense) and, finally, made a judgment as to the voice deviation intensity.
The VAS was subsequently converted into a numerical scale, with values from 1 to 4, wherein grade 1 represented individuals
with NVQV (0-35.5 mm), grade 2 represented subjects with mild
to moderate deviation (35.6 to 50.5 mm), grade 3 represented a moderate deviation (50.6 to 90.5 mm) and grade 4 represented
an intense deviation (> 90.5 mm)(22).
At the end of the auditory-perceptual evaluation, 10% of the samples were randomly repeated to evaluate the reliability
of the judges’ analysis using Cohen’s kappa coefficient.
The auditory-perceptual analysis results of the judge with the
greatest reliability (kappa coefficient of 0.79) were selected
for use in this study. The other two judges had kappa values of <0.70.
The patients were categorized into two groups according to the auditory-perceptual analysis results as follows: 33 patients
with NVQV (GG≤35.5 mm) and 269 patients with voice quality deviations (GG≥35.6 mm). Of the patients with voice quality deviations, 150 were classified as mild to moderate (35.6≤GG≤50.5 mm), 112 as moderate (50.6≤GG≤90.5 mm)
and 7 as having an intense deviation (GG> 90.5 mm). Of
the 269 patients with voice quality deviations, 135 (50.18%) had a predominantly rough voice quality, 95 (35.31%) had a predominantly breathy voice quality and 39 (14.49%) had a predominantly tense voice quality.
The otorhinolaryngological reports of the 33 NVQV patients showed voice complaints and a lack of structural and
functional laryngeal changes. Of the 269 patients with voice quality deviations, all had voice complaints; 30 had a medical
diagnosis of an absence of structural and functional laryngeal changes, and 239 were diagnosed with laryngeal changes, as described above.
This sample characterization is consistent with the literature, as there is no direct relationship between the presence of a
voice complaint, the presence of voice quality deviation and
the presence of laryngeal changes(5). Therefore, given that the purpose of this study was not to evaluate the acoustic parameters according to the presence or not of a speech disorder but to clarify the relationship between auditory-perceptual parameters and acoustic measures in evaluating the intensity and type of voice deviation, we decided not to exclude individuals with voice complaints but no laryngeal changes. These criteria strengthen the internal validity of the study and ensure that the independent variable (auditory-perceptual evaluation) is the
only or most likely explanation for the effect on the dependent
variable (acoustic parameters).
Data analysis
Descriptive statistical analyses were performed for all variables, including the mean and standard deviation values. Quadratic discriminant analysis (QDA) was performed to classify the
signals as a function of the GG and predominant voice quality,
with K-fold cross-validation used as an auxiliary method. QDA was selected for this study because it allows identifying individual and combined variables that best discriminate between
pre-established groups (GG and predominant voice quality).
Eight acoustic measures were analyzed in the combined measure
analysis and were combined 2 by 2, 3 by 3, 4 by 4, up to 8 by 8.
In the K-fold cross validation method, the classification was
performed ten times, varying the data set, which is used for training and testing without repetition, so that more accurate results can be obtained(22). Thus, signals with different GGs
and predominant voice qualities were randomly divided into
subsets, with a minimum of 10 signals in each subset, as this minimum number of signals facilitates the best error estimates. Signals with strong deviations were excluded from the analysis because they did not satisfy the condition of having a minimum of 10 signals.
These subsets were compared by the means of the cross-validation procedure, and for each iteration between subsets, performance
measures (accuracy, sensitivity and specificity) were obtained for the classifier when discriminating the GG or predominant voice quality. At the end of all subset iterations, the mean and
standard deviation values of the formed subsets were extracted
and used to interpret the final classifier data.
Accuracy, sensitivity and specificity measures were
used to evaluate the classifier’s performance. In general, the interpretation of the sensitivity and specificity measures is most
evident when the groups being compared belong to a healthy (no changes) or pathologic (with changes) class(23). Therefore, when performing discriminant analysis between classes with
changes, such as performed in this study (when different deviation and predominant voice quality intensities were compared), it is necessary to determine in the classifier used the signal group that will have its correct classification measured by the sensitivity and the group that will have its correct classification measured by the specificity.
Therefore, a standard procedure was adopted in which the
first condition presented in each table would correspond to the signal that would be classified correctly by the specificity, while the second condition would be classified correctly by the
sensitivity (Chart 1).
The classification performance took into account signals with different GGs and different predominant voice qualities.
The individual power of each of the considered acoustic measures and possible combinations of these measures were also considered, identifying those that provided the best
classification rates between voice signals under the conditions
established in this study.
Considering that the accuracy can be classified as excellent (> 90%) good (80%-90%), acceptable (70%-80%), poor (60%-70%) or with no acceptable discrimination ability (<60%)(23), only
classifications with a performance of over 70% were analyzed. Discriminant analysis (accuracy, sensitivity and specificity) was
RESULTS
Tables 1 and 2 show the means and standard deviations of the acoustic measures as a function of GG and predominant
voice quality, respectively. These data will not be examined
separately but in conjunction with the performance of the
classifications used.
First, the accuracy of the isolated acoustic measures in discriminating the GG in the patients was tested. The GNE measure had the best performance (70.95%, SD = 3.05), achieving
a sensitivity of 86.67±5.44% and specificity of 55.83±5.13%
(Table 3).
When investigating the discriminatory power of the
combined acoustic measures in the classification of GG in
the investigated sample, the greatest accuracy was found in the following combinations: the means of F0, F2 and GNE
(75.24±4.86%) when distinguishing between NVQV and mild
to moderate deviations; and the SDs of F0, F1, F3, jitter and
GNE (74.02±3.26%) when discriminating between mild and
moderate deviations (Table 3).
Chart 1. Discrimination cases and their respective sensitivity and specificity measures
Discrimination case Sensitivity Specificity
NVQV × Mild to moderate Rate of correct classification of signals with mild to moderate deviation
Rate of correct classification of signals with NVQV
Mild × Moderate Rate of correct classification of signals with moderate deviation
Rate of correct classification of signals with mild deviation
NVQV × Breathy Rate of correct classification of signals with breathy voice quality
Rate of correct classification of signals with NVQV
NVQV × Rough Rate of correct classification of signals with rough voice quality
Rate of correct classification of signals with NVQV
Breathy × Tense Rate of correct classification of signals with tense voice quality
Rate of correct classification of signals with breathy voice quality
Rough × Breathy Rate of correct classification of signals with breathy voice quality
Rate of correct classification of signals with rough voice quality
Rough × Tense Rate of correct classification of signals with tense voice quality
Rate of correct classification of signals with rough voice quality
Caption: NVQV = normal voice quality variability
Table 1. Means and standard deviations of acoustic measures at different voice deviation intensities
Measure
VOICE DEVIATION INTENSITY
NVQV Mild to moderate Moderate
Mean SD Mean SD Mean SD
Mean F0 179.87 43.19 182.06 60.78 183.75 69.97
SD F0 7.28 15.82 10.80 21.58 24.78 37.83
F1 599.35 143.10 592.95 127.39 585.63 145.74
F2 2014.08 232.43 2018.87 213.42 2033.47 231.94
F3 2812.05 216.47 2843.75 245.48 2888.86 219.74
Jitter 0.25 0.50 0.50 1.22 1.87 2.86
Shimmer 3.91 3.09 5.32 4.29 9.11 9.11
GNE 0.90 0.119 0.83 0.19 0.68 0.24
Caption: F0 = fundamental frequency; SD = standard deviation; F1 = first formant; F2 = second formant; F3 = third formant; GNE = glottal to noise excitation; NVQV = normal voice quality variability
Table 2. Means and standard deviations of acoustic measures according to predominant voice quality
Measure
PREDOMINANT VOICE QUALITY
NVQV Breathy Rough Tense
Mean SD Mean SD Mean SD Mean SD
Mean F0 181.02 42.62 171.62 64.52 196.75 68.82 203.83 64.81
SD F0 7.36 15.95 18.86 33.70 13.99 22.00 19.63 35.42
F1 597.42 143.545 581.58 108.75 586.78 155.31 672.93 188.06
F2 2011.29 233.41 2005.97 229.23 2046.27 207.30 2063.18 210.52
F3 2808.08 216.124 2837.71 254.40 2911.63 202.524 2910.97 230.60
Jitter 0.25 0.509 1.44 2.46 1.02 2.75 1.42 3.27
Shimmer 3.91 3.09 8.71 7.45 6.33 5.76 8.827 11.46
GNE 0.90 0.11 0.76 0.22 0.69 0.27 0.83 0.20
The accuracy of the isolated measures in the discrimination
of predominant voice quality was analyzed next. GNE performed best in discriminating between NVQV and rough (73.57%±5.56),
between NVQV and breathy (82.38±3.73%) and between breathy
and tense (71.43%±4.76) (Table 4).
Finally, the performance of the combined acoustic measures in
the discrimination of the voice quality was tested. The means of
F0, shimmer and GNE (78.57±4.21%) were the best combination
when discriminating between NVQV and rough voice quality. The means of F3 and GNE (84.05±3.29%) were the best
combination for distinguishing between NVQV and breathy
voice quality. The means of F0, F3, and GNE (73.75%±3.75) were selected as the best combination for discriminating between rough and tense voices. The combination of the means of F0, F1 and GNE (75.71±6.41%) offered the best performance when discriminating between breathy and tense voices (Table 4).
Table 3. Accuracy, sensitivity and specificity of the best isolated acoustic measures and best acoustic measure combinations in the discrimination of voice deviation intensity
Voice deviation intensity Isolated measure Acc (%) Sens (%) Spec (%)
NVQV × Mild to moderate GNE 70.95±3.05 86.67±5.44 55.83±5.13
Best combination
NVQV × Mild to moderate Means of F0, F2, and GNE 75.24±4.86 84.17±5.34 67.50±7.90
Mild to moderate × Moderate SDs of F0, F1, F3, Jitter, and GNE 74.02±3.26 87.62±2.51 56.14±6.28
Caption: Acc = accuracy; Sens = sensitivity; Spec = specificity; F0 = fundamental frequency; SD = standard deviation; F1 = First formant; F3 = third formant; GNE = glottal to noise excitation; NVQV = normal voice quality variability
Table 4. Accuracy, sensitivity and specificity of the best isolated acoustic measures and best acoustic measure combinations in the discrimination of predominant voice quality
Predominant voice quality Isolated measure Acc (%) Sens (%) Spec (%)
NVQV × Rough GNE 73.57±5.56 88.33±4.84 59.17±11.00
NVQV × Breathy GNE 82.38±3.73 87.50±5.16 78.33±7.88
Breathy × Tense GNE 71.43±4.76 57.50±8.82 81.67±4.08
Best combination
NVQV × Rough Means of F0, Shimmer, and GNE 78.57±4.21 87.50±5.16 70.00±6.36
NVQV × Breathy Means of F3 and GNE 84.05±3.29 90.00±5.09 77.50±7.03
Rough × Tense Means of F0, F3, and GNE 73.75±3.75 60.83±6.34 84.17±5.75
Breathy ×. Tense Means of F0, F1, and GNE 75.71±6.41 71.67±7.05 78.33±8.16
Caption: Acc = accuracy; Sens = sensitivity; Spec = specificity; F0 = fundamental frequency; F1 = first formant; F3 = third formant; GNE = glottal to noise excitation; NVQV = normal voice quality variability
DISCUSSION
This study investigated the accuracy of both isolated and combined traditional acoustic and formantic measures in the
discrimination of GG and predominant voice quality in dysphonic
patients. Two hypotheses were raised as follows: 1) the combination of traditional acoustic and formantic measures improves the discrimination of GG in voices, and 2) the combination of traditional acoustic and formantic measures improves the
discrimination of different predominant voice qualities. Thus,
the discussion section was organized to elucidate the conclusions reached with regard to these hypotheses.
Traditional acoustic and formantic measures in the discrimination of voice deviation intensity
When analyzing the isolated acoustic measures, only GNE showed acceptable performance (70.95±3.05%) in the discrimination between NVQV voices and voices with mild to
moderate deviations, with higher sensitivity (86.67%±5.44) in the correct identification of signals with deviation.
The GNE measure appeared to be lower in patients with mild to moderate deviation than in individuals with NVQV. However, this measure did not produce values in either of the
two groups that were below the 0.5% cut-off point considered
for the presence of deviation in this parameter. In turn, in the comparative analysis, it could be inferred that patients with mild
to moderate voice deviation had more silent airflow between
the vocal folds than those with NVQV(5,11).
A study(4) conducted with 226 patients, 53 healthy controls and 173 patients with voice deviations demonstrated that GNE
showed excellent accuracy (95%) when differentiating between
healthy voices and those with deviations. Thus, it may be inferred that GNE is a good voice evaluation measure because it shows greater discrimination between healthy and deviated voices.
Based on the analysis of the combined acoustic measures, the hypothesis that a combination of traditional and formantic
measures would improve the performance of the classifier in the discrimination of GG was confirmed. In addition to increasing the accuracy and specificity values, the combination
and moderate deviations, which the isolated measures could not. The combination of measures relating to the means of F0, F2 and GNE obtained an accuracy of 75.24%± 4.86% when discriminating between signals with NVQV and those with mild to moderate deviations. Patients with mild to moderate deviations had lower GNE values and greater mean F0 and F2 values than did patients with NVQV.
Lower GNE values may indicate inefficient glottal closure,
more additive noise in the voice and a possible decrease in intensity(4,5,24). In turn, data in the present study in regard to GNE were analyzed comparatively between groups as no values were
below the cutoff in either group of signals.
The mean F0 values found were linked to the presence of longitudinal vocal fold tension, which causes a greater number of glottic cycles per second, resulting in a greater F0 elevation(25).
Increased F2 values are related to adjustments in the tongue anteriorization(6-8). Such adjustments promote the elevation of the laryngeal complex, and by means of a biomechanical action, there is a greater longitudinal tension in the vocal folds, with
a consequent rise in F0, increased vocal effort and decreased voice projection(14,25).
A study(26) analyzed the formantic measures of sustained vowels and found an increase in the values of these measures when the laryngeal complex was elevated. Furthermore, F0 values decreased when the vocal tract length increased (low larynx) and similarly increased when the vocal tract length decreased (high larynx).
It may be inferred from these findings that compared to individuals without voice quality deviation, patients with a mild
to moderate degree of deviation may implement supraglottic adjustments to compensate for dysfunctional glottic conditions
with the presence of increased silent airflow. These findings are
consistent with other studies(8-10,13-15) showing that dysphonic patients tend to make adjustments in the vocal tract to compensate for their voice problem.
Nonetheless, one can question whether the supraglottic
adjustment may be related to the source of the voice problem in these patients as the elevation of the larynx with increased longitudinal vocal fold tension reduces the convexity of the curvature of the free edge of the vocal folds, which is one of the mechanisms responsible for increased transglottic silent
airflow(27).
In general, the description and analysis of the formantic measures in the group with mild to moderate deviations seems to be interesting for understanding the supraglottic adjustments made by these patients, which may have implications for clinical evolution in voice therapy.
The measure combination of the SD of F0, F1, F3, jitter and
GNE also had an acceptable performance (74.02±3.26%) when
discriminating between signals with mild to moderate deviation and those with moderate deviation. The measures of the SD of F0, F1, F3 and jitter were higher in patients with moderate deviation, while the GNE values were lower in these patients than in individuals with mild to moderate deviation. In regard to the reference values for the GNE and jitter measures, only
the latter produced values above the cutoff point for being
considered deviated.
In physiological terms, the SD of F0 is directly related to the neuromuscular condition and vocal fold mucosa vibration regularity; thus, higher F0 SD values, as found in patients with moderate deviations, may indicate phonatory instability and greater vocal fold vibration irregularity, thereby causing deviations in voice production(24,25).
Jitter evaluates perturbations in the frequency of the
neighboring vibration cycles(11,18) and is the measure most correlated with GG(17) and sensitive to the presence of voice deviations. This explains its increase in individuals with moderate voice deviations in this study.
These data suggest that patients with moderate voice deviations have more irregular vocal fold vibrations and phonatory instability (increased SD of F0), greater silent airflow, more noise in the voice (decreased GNE) and a greater overall intensity of voice deviation (increased jitter) than do patients with mild to moderate deviations.
The increase in F1 values is related to the greater lowering of the oromandibular complex and to oropharyngeal narrowing(6-8,10,11). These cited supraglottic adjustments may occur as a compensation for dysfunctional glottic conditions, as a greater degree of jaw opening and pharyngeal narrowing may cause a decrease in auditorily perceived breathiness(27) and increased voice intensity(8-10,17). An increase in F1 is also
associated with the phonatory effort present in dysphonic patients
with muscular tension(14).
The hypothesis that a combination of traditional acoustic and formantic measures can improve discrimination in regard to GG
was confirmed. The information seems to have a complementary
nature, as formantic measures alone did not show acceptable discriminatory performance in the cases studied. Notably, in this study, an auditory-perceptual rating scale focused on the glottal source was used; therefore, one would expect a greater contribution from acoustic measures related to the glottal source.
However, more deviated voices seem to make greater supraglottic adjustments, as the higher values found in the combination of measures would be related to sensitivity, i.e., indicate the most deviated signals correctly.
Traditional acoustic and formantic measures in the discrimination of predominant voice quality
When analyzing acoustic measures alone, only GNE had an acceptable performance when discriminating between voices
in terms of predominant voice quality.
In regard to the discrimination between NVQV and rough
voices, an accuracy of 73.57±5.56% was found, with greater sensitivity (88.33±4.84%) for the correct identification of rough voices. Regarding the NVQV vs. breathy discrimination, an
accuracy of 82.38±3.73% was found, with greater sensitivity
(87.50±5.16%) in the correct identification of breathy voices. In the breathy vs. tense discrimination, an accuracy of 71.43±4.76% was found, with greater specificity (81.67±4.08%) in the correct identification of breathy voices.
Once again, in an isolated form, only the GNE measure showed
differentiating breathy voices from other voice types. This finding
is probably because GNE is directly related to the source of the voice signal, i.e., whether it comes from vocal fold vibration
or turbulent airflow generated in the vocal tract(4,5). This factor could explain the direct relationship with this parameter.
The hypothesis that a combination of traditional acoustic and formantic measures can improve the discrimination of
predominant voice quality was confirmed, as the combination of these measures improved the performance of the classifier
when discriminating between NVQV and rough, NVQV and breathy and breathy and tense voices. It also provided acceptable discrimination between rough and tense voices.
When discriminating between NVQV and rough voices, the best combination found was the measures of the means of F0, shimmer and GNE. This combination had an accuracy of
78.57±4.21% and greater sensitivity (87.50±5.16%) in the correct identification of rough voices. The means of F0 and shimmer
values were higher in patients with a rough voice quality, while
the GNE values were reduced in relation to VNQN voices. In general, it is expected that rough voices will have lower F0 values(18). However, the increase of this measure in this study may be explained by the fact that patients with rough voices possibly had tension associated with emission and that, therefore, there was an increase in F0(2,14,28) compared to patients with NVQV.
Shimmer is a measure related to the variation in amplitude between adjacent cycles and is thus related to vibratory irregularity and glottic resistance(4,29). On the auditory-perceptual plane, previous studies have shown that shimmer is related to roughness(17,18). The shimmer values in this study contributed
to the correct identification of rough voices. It should be noted
that although the shimmer values are most deviated in voices with roughness, these values are still within the normal range,
given the cutoff values adopted.
The objectives of one study(18) included an analysis of the discriminatory power of acoustic measures when classifying
deviation intensity and differentiating predominant voice types. A total of 186 dysphonic patients participated in the study. The measures used were the fundamental frequency (F0), jitter, shimmer and GNE. The results showed that the shimmer and GNE were useful in detecting rough and breathy voices, respectively.
Data from the aforementioned study(18) appear similar to those found in the present one as shimmer was correlated with the roughness parameter and GNE in this study. Although appearing in all combinations, shimmer seemed to be more sensitive in
relation to voices with a breathy quality.
The F3 and GNE measures were selected as the best combination when discriminating between NVQV and breathy
voices (84.05±3.29%) and had high sensitivity (90.00±5.09%) in the correct identification of breathy voices. Patients with
breathy voices had higher F3 values and lower GNE values .
The F3 frequency is related to the two cavities established
by the tongue position, that is, the cavity behind the tongue
constriction and the one in front of it. The F3 frequency can also be affected by adjustments to the lips, larynx and pharynx,
and it has a tendency to decrease with labiodentalization adjustment and lip rounding and to increase with constriction
around the pharynx(3,10,11,20). Thus, one can infer that patients
with a predominantly breathy voice quality have a greater
constriction around the pharynx and more stretched lips, probably as compensatory mechanisms to the increase voice intensity.
The findings of this study reinforce the fact that the GNE measure is strongly related to the breathy voice quality(4,5,18,28) and is the only isolated measure with acceptable accuracy when discriminating between NVQV and breathy signals.
When discriminating between rough and tense voices, the best combination found was the measures of the means of F0, F3 and GNE (73.75±3.75%), and this combination had
greater specificity (84.17±5.75%) when identifying rough
voices. The mean F0 was lower in patients with roughness than in those with tense voices, F3 had higher values in patients with roughness, and the GNE was higher in patients with a
tense voice quality.
The findings suggest that patients with a tense voice quality
may have greater longitudinal tension in the vocal folds due to the higher mean F0 values. Furthermore, it appears that patients with roughness have a smaller cavity in the vocal tract due to the increase in F3(11,13) and that patients with a tense voice
quality seem to have less noise in the voice(4,5) than patients with roughness, an aspect suggested by the fact that the GNE is less deviated in tense voices.
The rough vs. tense discrimination category appeared only when there was a combination of measures and there was no acceptable isolated value. This demonstrates the importance of
finding the best combination of formantic measures to identify voice quality(4,24)
.
The measures relating to the mean F0, F1 and GNE were selected for discrimination between breathy and tense voices,
with an accuracy of 75.71± 6.41% and with higher specificity (78.33±8.16%) in the correct identification of breathy voices.
The F0 and F1 values were greater in patients with tense voices, and the GNE was lower in patients with breathy voices.
Regarding the mean F0 and tense voice quality, it is important
to note that the fundamental frequency is determined, among
other factors, by vocal fold tension, which is controlled by the
intrinsic laryngeal muscles, specifically the cricothyroid(2,11,15). Thus, patients with vocal tension usually exhibit greater contraction of the extrinsic and intrinsic muscles, including greater longitudinal vocal fold tension, greater subglottic pressure and greater vocal tract constriction, generating a larger number of glottic cycles
per second and hence a greater fundamental frequency(25). The general grade and roughness seem to be parameters more related to F0(28,30). The mean F
0 values are higher both in general grade and in vocal tension, and the F0 standard deviation
values are also high in rough voices. This study’s findings seem
to agree in regard to the increase in F0 in patients with tense voices and the positive relationship between F0 and the general grade of voice deviation.
In relation to the increased F1 values, it would seem that
patients with a tense voice quality may make adjustments in
the vocal tract, having a larger vertical opening of the mouth and greater pharyngeal constriction(6-8,10,11) than patients with a
A study(14) conducted with 111 women with muscle tension dysphonia found similar results. The F1 and F2 formants were elevated in this population compared to those with healthy voices, suggesting adjustments in the supraglottis relating to a greater vertical opening of the mouth, greater pharyngeal constriction and a lower and more anterior tongue position. The adjustments found in that study are similar to those of the present study in regard to the greater vertical opening of the mouth and increased pharyngeal constriction as indicated by an
increase in F1 in patients with a tense voice quality.
Analysis of the combined acoustic measures in the
discrimination of the predominant voice quality again revealed
that the GNE measure appeared in all acceptable combinations found. The F0 measure was present in most of the combinations
when discriminating predominant voice quality, which attests
to the results found in previous studies(8,13,18,29,30) in which the
fundamental frequency appeared to be an interesting measure when discriminating voice quality. This is probably because it is
related, in physiological terms, to the neuromuscular condition and vocal fold mucosa vibration regularity, and in acoustic and perceptual terms, it is directly related to the sound signal periodicity(6,9,11,30).
In summary, a combination of perturbation/noise measures
and formantic measures promotes a slight improvement (75.24%) in the classification rate between voices with NVQV and those
with mild to moderate deviation in relation to the GNE measure alone (70.95%). This combination also facilitates discrimination between voices with mild to moderate and moderate deviations,
which was not observed with isolated measures. These findings offer evidence that the greater the voice deviation intensity, the
more complex the signal in terms of the aperiodicity and noise.
Such intensities therefore require a combination of measures to characterize them adequately.
Furthermore, a combined analysis of measures relating to
the glottal source (perturbation and noise) and filter (formantic
measures) contributes to a broadening of our understanding of
source-filter interaction mechanisms in deviated voices and may
be useful when measuring the results of treatment and monitoring during voice therapy. The fact that more formantic measures
(F1 and F3) were selected by the classifier for discriminating
more deviated voices shows that individuals with more intense deviations make more vocal tract adjustments, probably as a compensatory mechanism in response to the functional
inefficiency of the glottal source.
In regard to the predominant voice quality, the formantic
measures proved important when classifying between NVQV and breathy (F3), rough and tense (F3) and breathy and tense
(F1) voices. Specifically, the formantic measures seem to
provide a greater contribution to the discrimination of the auditory-perceptual parameter tension. Individuals with tense voices probably make more supraglottic adjustments, either for compensatory reasons or in cooccurrence with the alterations at the glottic level.
The presence of a voice disorder tends to change the voice
signal in different ways and may combine various types of
perturbation and noise in vocal emissions as well as possible supraglottic adjustments. The combined use of measures for the
evaluation, characterization and classification of the voice signal
may therefore better represent voice production characteristics and highlight manifestations that would not be detected with the use of isolated measures. Other studies(3,28,30) have shown that a combination of perturbation and noise measures improves the discrimination between signals with and without voice deviations. However, in terms of this study, it may be concluded that combining measures related to vocal tract adjustments with traditional perturbation and noise measures can improve
the classification of the voice deviation intensity and type and provide insights into the source-filter interaction in patients
with voice deviations.
CONCLUSION
The GNE acoustic measure was the only one able to discriminate voice deviation intensity and predominant voice
quality in isolation. There was a gain in the classification
performance when traditional acoustic and formantic measures were combined in the discrimination of both the voice deviation
intensity and predominant voice quality.
REFERENCES
1. Dejonckere PH, Bradley P, Clemente P, Cornut G, Crevier-Buchman L, Friedrich G, et al. A basic protocol for functional assessment of voice pathology, especially for investigating the efficacy of (phonosurgical)
treatments and evaluating assessment techniques. Eur Arch Otorhinolaryngol.
2001;258(2):77-82. http://dx.doi.org/10.1007/s004050000299. PMid:11307610. 2. Kempster GB, Gerratt BR, Verdolini Abbott K, Barkmeier-Kraemer J, Hillman
RE. Consensus auditory-perceptual evaluation of voice: development of a
standardized clinical protocol. Am J Speech Lang Pathol.
2009;18(2):124-32. http://dx.doi.org/10.1044/1058-0360(2008/08-0017). PMid:18930908. 3. Brockmann-Bauser M, Drinnan MJ. Routine acoustic voice analysis: time
to think again? Curr Opin Otolaryngol Head Neck Surg.
2011;19(3):165-70. http://dx.doi.org/10.1097/MOO.0b013e32834575fe. PMid:21483265.
4. Godino-Llorente JI, Osma-Ruiz V, Sáenz-Lechón N, Gómez-Vilda P, Blanco-Velasco M, Cruz-Roldán F. effectiveness of the glottal to noise
excitation ratio for the screening of voice disorders. J Voice. 2010;24(1):47-56. http://dx.doi.org/10.1016/j.jvoice.2008.04.006. PMid:19135854. 5. Treole K, Trudeau MD. Changes in sustained production tasks among
women with bilateral vocal nodules before and after voice therapy. J Voice.
1997;11(4):462-9. http://dx.doi.org/10.1016/S0892-1997(97)80043-4.
PMid:9422281.
6. Fant G. Acoustic theory of speech production. 2nd ed. Paris: Mouton; 1970.
7. Ladefoged P. Elements of acoustic phonetics. Chicago: University of
Chicago Press; 1996.
8. Camargo ZA. Análise da qualidade vocal de um grupo de indivíduos
disfônicos: uma abordagem interpretativa e integrada de dados de natureza
acústica, perceptiva e eletroglotográfica [tese]. São Paulo: Pontifícia Universidade Católica de São Paulo; 2002. 283 p.
9. Silva MFBL, Madureira S, Rusilo LC, Camargo Z. Vocal quality assessment:
methodological approach for a perceptive data analysis. Rev CEFAC. 2017;19(6):831-41. http://dx.doi.org/10.1590/1982-021620171961417. 10. Magri A, Stamado T, Camargo ZA. Influência da largura de banda de
formantes na qualidade vocal. Rev CEFAC. 2009;11(2):296-304. http://
dx.doi.org/10.1590/S1516-18462009005000010.
11. Lee S-H, Yu J-F, Hsieh Y-H, Lee G-S. Relationships between formant
frequencies of sustained vowels and tongue contours measured by ultrasonography. Am J Speech Lang Pathol. 2015;24(4):739-49. http://
12. Titze I, Palaparthi A. Sensitivity of source-filter interaction to specific vocal
tract shapes. IEEE Trans Audio Speech Lang Process.
2016;24(12):2507-15. http://dx.doi.org/10.1109/TASLP.2016.2616543.
13. Camargo ZA, Vilarim GS, Cukier S. Parâmetros perceptivo-auditivos e
acústicos de longotermo da qualidade vocal de indivíduos disfônicos. Rev CEFAC. 2004;6(2):189-96.
14. Roy N, Nissen SL, Dromey C, Sapir S. Articulatory changes in muscle tension dysphonia: evidence of vowel space expansion following manual
circumlaryngeal therapy. J Commun Disord. 2009;42(2):124-35. http://
dx.doi.org/10.1016/j.jcomdis.2008.10.001. PMid:19054525.
15. Muhammad G, Mesallam TA, Malki KH, Farahat M, Alsulaiman M, Bukhari M. Formant analysis in dysphonic patients and automatic Arabic
digit speech recognition. Biomed Eng Online. 2011;10:41. PMid:21624137. 16. Schwartz SR, Cohen SM, Dailey SH, Rosenfeld RM, Deutsch ES, Gillespie MB, et al. Clinical practice guideline: hoarseness (dysphonia). Otolaryngol
Head Neck Surg. 2009;141(3, Supl 2):S1-31. http://dx.doi.org/10.1016/j. otohns.2009.06.744. PMid:19729111.
17. Ma EP, Yiu EM. Multiparametric evaluation of dysphonic severity. J
Voice. 2006;20(3):380-90. http://dx.doi.org/10.1016/j.jvoice.2005.04.007.
PMid:16185841.
18. Lopes LW, Cavalcante DP, Costa PO. Severity of voice disorders: integration of perceptual and acoustic data in dysphonic patients. CoDAS.
2014;26(5):382-8. http://dx.doi.org/10.1590/2317-1782/20142013033. PMid:25388071.
19. Cohen SM, Pitman MJ, Noordzij JP, Courey M. Management of dysphonic
patients by otolaryngologists. Otolaryngol Head Neck Surg. 2012;147(2):289-94. http://dx.doi.org/10.1177/0194599812440780. PMid:22368039. 20. Gonçalves MIR, Pontes PAL, Vieira VP, Pontes AAL, Curcio D, Biase NG.
Função de transferência das vogais orais do Português brasileiro: análise
acústica comparativa. Rev Bras Otorrinolaringol. 2009;75(5):680-4.
21. Ozkan H. A Comparison of classification methods for telediagnostics of
Parkinson’s disease. Entropy. 2016;18(115):1-14.
22. Yamasaki R, Madazio G, Leão SHS, Padovani M, Azevedo R, Behlau M. Auditory-perceptual evaluation of normal and dysphonic voices using the
voice deviation scale. J Voice. 2017;31(1):67-71. http://dx.doi.org/10.1016/j. jvoice.2016.01.004. PMid:26873420.
23. Hosmer DW, Lemeshow S. Applied logistic regression. New York: Willey; 2000. http://dx.doi.org/10.1002/0471722146.
24. González CMT, Hernandez JBA, Orozco-Arroyave JR, Casals JS, Gallego-Jutgla E. Automatic detection of laryngeal pathologies in running speech based on the HMM transformation of the nonlinear dynamics. Lect Notes
Comput Sci. 2013;1:136-43.
25. Van Houtte E, Van Lierde K, Claeys S. Pathophysiology and treatment of muscle tension dysphonia: a review of the current knowledge. J Voice. 2011;25(2):202-7. http://dx.doi.org/10.1016/j.jvoice.2009.10.009.
PMid:20400263.
26. Macari AT, Ziade G, Turfe Z, Chidiac A, Alam E, Hamdan AL. Correlation between the position of the hyoid bone on lateral cephalographs and formant
frequencies. J Voice. 2016;30(6):757.e21-6. http://dx.doi.org/10.1016/j.
jvoice.2015.08.020. PMid:26604010.
27. Samlan RA, Story BH, Bunton K. Relation of perceived breathiness to laryngeal kinematics and acoustic measures based on computacional
modeling. J Speech Lang Hear Res. 2013;56(4):1209-23. http://dx.doi.
org/10.1044/1092-4388(2012/12-0194). PMid:23785184.
28. Lopes LW, Costa SLNC, Costa WCA, Correia SEN, Vieira VJD. Acoustic assessment of the voices of children using nonlinear analysis: proposal
for assessment and vocal monitoring. J Voice. 2014;28(5):565-73. http://
dx.doi.org/10.1016/j.jvoice.2014.02.013. PMid:24836362.
29. Madazio G, Leão S, Behlau M. The phonatory deviation diagram: a novel objective measurement of vocal function. Folia Phoniatr Logop.
2011;63(6):305-11. http://dx.doi.org/10.1159/000327027. PMid:21625144. 30. Lopes LW, Simões LB, Silva JD, Evangelista DS, Ugulino ACN, Costa
Silva PL, et al. Accuracy of acoustic analysis measurements in the evaluation of patients with different laryngeal diagnoses. J Voice. 2017;31(3):382.
e15-26. http://dx.doi.org/10.1016/j.jvoice.2016.08.015. PMid:27742492.
Author contributions