Capítulo 4 Modelo Mel-Cepstral Generalizado
F.2 Análise Mel-Cepstral
Da mesma forma, (F.17) contempla eE como saída do filtro W
WԣФ¥ cuja entrada é e. Assim
eE é representada por eE =2{ Â1 QWÔ*+,EQ *+, 1 1 + *x+, *+,LI à xà (F.22)
Substituindo (F.21) e (F.22) em (F.18), verifica-se que (F.18) equivale a (F.12).
F.2 Análise Mel-Cepstral
Para o caso mel-cepstral, fazendo a substituição de variáveis = em (D.35), obtém-se =̃X> =2{ Â1 Q*+,EΦ) ∗*+, I I I à xà > = 0,1, … ,2k (F.23)
Substituindo (F.6) em (F.23), obtém-se =̃X> na forma
=̃X> = 1 2{  Q*+,E 1 − E *+,+ *x+,+ I à xà > = 0 1 − E 1 2{  Q*+,E 1 1 + *+, *+,)I à xà , > = 1,2, … ,2k o (F.24)
182 eXç =2{ Â1 Q*+,E 1 1 + *+, *+,0I à xà ç = 0,1, … ,2k (F.25) e =̃X> é obtido por =̃X> = eX0 − eX1, > = 0 1 − Ee X>, > = 1,2, … ,2k o (F.26)
expressão cuja consistência com (F.24) (para > = 0) pode ser facilmente verificada. Para o cálculo de eXç, utiliza-se
Q*+, = exp á1 + *+, w : K LzW
*x+,Lã (F.27)
expressão na qual o fator do somatório pode ser calculado via FFT.
O critério espectral ; é dado por =̃X0 (o que é verificado em (F.24) para > = 0, que equilave a (F.1)). Como já considerado no Apêndice D, os elementos associados ao vetor gradiente 92Ü; são dados por
=̃X>, > = 1,2, … , k e aqueles associados à matriz Hessiana <, por
?X> = 1 − E=̃ X0 + 2=̃X1, > = 0 =̃X> + =̃X> + 1, > = 1,2, … , k − 1 o expressão (D.46) e @X> = =̃X> + =̃X> − 1, > = 2,3, … ,2k expressão (D.47)
183
Referências Bibliográficas
[1] Keiichi Tokuda, Takao Kobayashi, Takashi Masuko, and Satoshi Imai, "Mel- Generalized Cepstral Analysis - A Unified Approach to Speech Spectral Estimation," Proc. ICSLP-94, vol. 3, pp. 1043-1046, Sep. 1994.
[2] Keiichi Tokuda, Takao Kobayashi, Satoshi Imai, and Takeshi Chiba, "Spectral Estimation of Speech by Mel-Generalized Cepstral Analysis," Electronics and
Communications in Japan, Part 3, vol. 76, no. 2, pp. 30-43, 1993.
[3] B. S. Atal and Suzanne L. Hanauer, "Speech Analysis and Synthesis by Linear Prediction of the Speech Wave," The Journal of the Acoustic Society of America, vol. 50, no. 2B, pp. 637-655, Apr. 1971.
[4] S. Imai and C. Furuichi, "Unbiased Estimator of Log Spectrum and Its Application to Speech Signal Processing," Proc. 1988 Eurasip, pp. 203-206, 1988.
[5] Keiichi Tokuda, Takao Kobayashi, and Satochi Imai, "Adaptive Cepstral Analysis of Speech," IEEE Transactions on Speech and Audio Processing, vol. 3, no. 6, pp. 481-489, Nov. 1995.
[6] Toshiaki Fukada, Keiichi Tokuda , Takao Kobayashi, and Satoshi Imai, "An Adaptive Algorithm for Mel-Cepstral Analysis of Speech," Acoustics, Speech,
and Signal Processing, 1992. ICASSP-92. , vol. 1, pp. 137-140, Mar. 1992.
[7] K. Tokuda, T. Kobayashi, T. Fukada, H. Saito, and S. Imai, "Spectral estimation of speech based on mel-cepstral representation," Trans. (A) I.E.I.C.E., Japan, vol. J74-A, no. 8, pp. 1240-1248, Aug. 1991. (em Japonês).
[8] E. Kruger and H. W. Strube, "Linear Prediction on a Warped Frequency Scale,"
IEEE Trans., Acoust., Speech, and Signal Processing, vol. ASSP-36, pp. 1529-
1531, 1988.
[9] Keiichi Tokuda, Takao Kobayashi, and Satochi Imai, "Generalized Cepstral Analysis of Speech - Unified Approach to LPC and Cepstral Method," Proc.
ICSLP-90, pp. 37-40, 1990.
[10] Kazuhito Koishida, Keiichi Tokuda, Takao Kobayashi, and Satoshi Imai, "Spectral Representation of Speech Based on Mel-Generalized Cepstral
Coefficients and Its Properties," Electronics and Communications in Japan, Part
3, vol. 83, no. 3, pp. 50-59, Mar. 2000.
[11] Kazuhito Koishida, Keiichi Tokuda, Takao Kobayashi, and Satoshi Imai, "CELP Coding System Based on Mel-Generalized Cepstral Analysis," Spoken
184 vol. 1, pp. 318-321, Oct. 1996.
[12] Kazuhito Koishida, Keiichi Tokuda, Takao Kobayashi, and Satoshi Imai,
"Efficient Encoding of Mel-Generalized Cepstrum for CELP Coders," Acoustics,
Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on , vol. 2, pp. 1355-1358, Apr. 1997.
[13] Kazuhito Koishida, Gou HIRABAYASHI, Keiichi Tokuda, and Takao Kobayashi, "A 16 kb/s Wideband CELP-Based Speech Coder Using Mel-
Generalized Cepstral Analysis," IEICE Trans. Inf. & Syst., vol. E83-D, no. 4, pp. 876-883, Apr. 2000.
[14] Heiga Zen, Tomoki Toda, and Keiichi Tokuda, "The Nitech-NAIST HMM- Based Speech Synthesis System for the Blizzard Challenge 2006," IEICE
TRANS. INF. & SYST., VOL.E91–D, NO.6 JUNE, vol. E91–D, no. 6, pp. 1764-
1773, June 2008.
[15] T. KOBAYASHI and S. IMAI, "Spectral Analysis Using Generalized
Cepstrum," IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL
PROCESSING, vol. ASSP-3, no. 5, pp. 1087-1089, Aug. 1984.
[16] Jae S. Lim, "Spectral Root Homomorphic Deconvolution System," IEEE
Tansactions on Acoustic, Speech and Signal Processing, vol. AASP2-7, no. 3,
pp. 223-233, Junho 1979.
[17] A. V. Oppenheim and R. W. Shafer, Digital Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1975, ch. 10.
[18] Satoshi Imai, "Adaptive mel cepstral analysis based on UELS method," in
Adaptive Systems for Signal Processing, Communications, and Control Symposium 2000. AS-SPCC. The IEEE 2000, vol. AS-SPCC. The IEEE 2000,
Lake Louise, Alta. , Canada., 2000, pp. 304 - 309.
[19] Murray R. Spiegel, Schaum´s Outlines: Complex Variables With an Introduction
to Conformal Mapping and Its Applications, 1st ed. USA: McGraw-Hill, 1964.
[20] Takao Kobayashi and Satoshi Imai, "Speech Synthesis Using Generalized Cepstrum," Electronics and Communications in Japan, vol. 65-A, no. 3, pp. 28- 36, 1982.
[21] A. V. Oppenheim and D. H. Johnson, "Discrete Representation of signals,"
Proceedings of the IEEE, vol. 60, no. 6, pp. 681-691, June 1972.
[22] Keiichi Tokuda, Takao Kobayashi, and Satoshi Imai. (1994, Apr.) Recursive Calculation of Mel-Cepstrum from LP Coefficients. [Online].
http://www.sp.nitech.ac.jp/~tokuda/tips/mgceptr_sa2.pdf
[23] K. Tokuda, T. Kobayashi, and S. Imai, "Recursion Formula for Calculation of Mel Generalized Cepstrum Coefficients," Trans. IEICE, vol. J71-A, pp. 128– 131, Jan. 1988. (em Japonês).
185
[24] F. Itakura and S. Saito, "Speech Information Compression based on the
Maximum Likelihood Spectral Estimation," J. Acous. Soc. Japan, vol. 27, no. 9, pp. 463-472, Sep. 1971.
[25] M. Pinsker, Information and Information Stability of Random Variables. San Francisco: Holden-Day, 1963.
[26] I. Gohberg and I. Koltracht, "Efficient algorithm for Toeplitz plus Hankel matrices," Integral Equations and Operator Theory, vol. 12, pp. 136-142, 1989. [27] A. E. Yagle, "New analogs of split algorithms for arbitrary Toeplitz-plus-Hankel
matrices," IEEE Trans. Acoust., Speech & Signal Process, vol. ASSP-39, no. 11, pp. 2457-2463, Nov. 1991.
[28] Ramiro R. A. Barreira and Fábio Violaro, "Avaliação da Análise Cepstral Generalizada Aplicada à Modelagem de Vogais," in Anais do 7o Congresso da
AES Brasil 2009, São Paulo, 2009, pp. 95-101.
[29] Ramiro R. A. Barreira and Fábio Violaro, "Uma Avaliação da Análise Cepstral Generalizada Aplicada ao Espectro de Vogais," in Anais do XXVII Simpósio
Brasileiro de Telecomunicações, 2009, Blumenau, 2009.
[30] Satoshi Imai, Kazuo Sumita, and Chieko Furuichi, "Mel Log Spectrum Approximation (MLSA) Filter for Speech Synthesis," Electronics and
Communications in Japan, vol. 66-A, no. 2, pp. 10-18, 1983.
[31] T. Kobayashi, S. Imai, and Y. Fukuda, "Mel generalized-log spectrum
approximation (MGLSA) filter," Journal of IEICE (Japanese Edition), vol. J68- A, no. 6, pp. 610–611, 1985.
[32] Heiga Zen, Keiichi Tokuda, and Alan W. Black, "Statistical Parametric Speech Synthesis," Speech Communication, vol. 51, no. 11, pp. 1039-1064, Nov. 2009. [33] Hideki Kawahara, Ikuyo Masuda-Katsuse, and Alain de Cheveigné,
"Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds," Speech Communication, vol. 27, pp. 187-207, 1999.
[34] Heiga Zen, Tomoki Toda, Masaru Nakamura, and Keiichi Tokuda, "Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005," IEICE Trans, Inf. & Sust., vol. E90–D, no. 1, pp. 325-333, Jan. 2007. [35] Hideki Kawahara, Jo Estill, and Osamu Fujimura, "Aperiodicity extraction and
control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT," in 2nd
MAVEBA, Firenze, Italy, 2001.
[36] R. Maia, T. Toda, H. Zen, Y. Nankaku, and K. Tokuda, "An excitation model for HMM-based speech synthesis based on residual modeling," in Proc. ISCA SSW6,
186 2007, pp. 131-136.
[37] Thomas Drugman, Geoffrey Wilfart, and Thierry Dutoit, "A Deterministic plus Stochastic Model of the Residual Signal for Improved Parametric Speech Synthesis," in Proc. Interspeech 2009, Brighton, UK, 2009.
[38] Thomas Drugman, Geoffrey Wilfart, and Thierry Dutoit, "EIGENRESIDUALS FOR IMPROVED PARAMETRIC SPEECH SYNTHESIS," in 17th European
Signal Processing Conference (EUSIPCO 2009), Glasgow, Scotland, 2009, pp.
2176-2180.
[39] T. Toda and K. Tokuda, "A speech parameter generation algorithm considering global variance for HMM-based speech synthesis," IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 816–824, 2007.
[40] Chiyomi Miyajima, Hideyuki Watanabe, Keiichi Tokuda, Tadashi Kitamura, and Shigeru Katagiri, "A new approach to designing a feature extractor in speaker identification based on discrimanative feature extraction," Speech