Análise Mel-Cepstral - Modelo Mel-Cepstral Generalizado

Capítulo 4 Modelo Mel-Cepstral Generalizado

F.2 Análise Mel-Cepstral

Da mesma forma, (F.17) contempla e_E como saída do filtro W

WÔ£Ð¤¥ cuja entrada é e. Assim

eE é representada por eE =_{2{ Â}1 QWÔ_*+,EQ _*+, 1 1 + *x+,_*+,LI Ã xÃ (F.22)

Substituindo (F.21) e (F.22) em (F.18), verifica-se que (F.18) equivale a (F.12).

F.2 Análise Mel-Cepstral

Para o caso mel-cepstral, fazendo a substituição de variáveis = em (D.35), obtém-se =̃X> =_{2{ Â}1 Q*+,EΦ) ∗_*+,I I I Ã xÃ > = 0,1, … ,2k (F.23)

Substituindo (F.6) em (F.23), obtém-se =̃_X> na forma

=̃X> = 1 2{ Â Q*+,E 1 − E *+,_{+ *}x+,_{+ I} Ã xÃ > = 0 1 − E 1 2{ Â _Q*+,E 1 1 + *+,_*+,)I Ã xÃ , > = 1,2, … ,2k o (F.24)

182 eXç =_{2{ Â}1 Q*+,E 1 1 + *+,_*+,0I Ã xÃ ç = 0,1, … ,2k (F.25) e =̃_X> é obtido por =̃X> = eX0 − eX1, > = 0 1 − E_e X>, > = 1,2, … ,2k o (F.26)

expressão cuja consistência com (F.24) (para > = 0) pode ser facilmente verificada. Para o cálculo de e_Xç, utiliza-se

Q*+,_{= exp á1 + *}+,_w : K LzW

*x+,L_ã (F.27)

expressão na qual o fator do somatório pode ser calculado via FFT.

O critério espectral ; é dado por =̃_X0 (o que é verificado em (F.24) para > = 0, que equilave a (F.1)). Como já considerado no Apêndice D, os elementos associados ao vetor gradiente 9₂Ü; são dados por

=̃X>, > = 1,2, … , k e aqueles associados à matriz Hessiana <, por

?X> = 1 − E_=̃ X0 + 2=̃X1, > = 0 =̃X> + =̃X> + 1, > = 1,2, … , k − 1 o expressão (D.46) e @X> = =̃X> + =̃X> − 1, > = 2,3, … ,2k expressão (D.47)

183

Referências Bibliográficas

[1] Keiichi Tokuda, Takao Kobayashi, Takashi Masuko, and Satoshi Imai, "Mel- Generalized Cepstral Analysis - A Unified Approach to Speech Spectral Estimation," Proc. ICSLP-94, vol. 3, pp. 1043-1046, Sep. 1994.

[2] Keiichi Tokuda, Takao Kobayashi, Satoshi Imai, and Takeshi Chiba, "Spectral Estimation of Speech by Mel-Generalized Cepstral Analysis," Electronics and

Communications in Japan, Part 3, vol. 76, no. 2, pp. 30-43, 1993.

[3] B. S. Atal and Suzanne L. Hanauer, "Speech Analysis and Synthesis by Linear Prediction of the Speech Wave," The Journal of the Acoustic Society of America, vol. 50, no. 2B, pp. 637-655, Apr. 1971.

[4] S. Imai and C. Furuichi, "Unbiased Estimator of Log Spectrum and Its Application to Speech Signal Processing," Proc. 1988 Eurasip, pp. 203-206, 1988.

[5] Keiichi Tokuda, Takao Kobayashi, and Satochi Imai, "Adaptive Cepstral Analysis of Speech," IEEE Transactions on Speech and Audio Processing, vol. 3, no. 6, pp. 481-489, Nov. 1995.

[6] Toshiaki Fukada, Keiichi Tokuda , Takao Kobayashi, and Satoshi Imai, "An Adaptive Algorithm for Mel-Cepstral Analysis of Speech," Acoustics, Speech,

and Signal Processing, 1992. ICASSP-92. , vol. 1, pp. 137-140, Mar. 1992.

[7] K. Tokuda, T. Kobayashi, T. Fukada, H. Saito, and S. Imai, "Spectral estimation of speech based on mel-cepstral representation," Trans. (A) I.E.I.C.E., Japan, vol. J74-A, no. 8, pp. 1240-1248, Aug. 1991. (em Japonês).

[8] E. Kruger and H. W. Strube, "Linear Prediction on a Warped Frequency Scale,"

IEEE Trans., Acoust., Speech, and Signal Processing, vol. ASSP-36, pp. 1529-

1531, 1988.

[9] Keiichi Tokuda, Takao Kobayashi, and Satochi Imai, "Generalized Cepstral Analysis of Speech - Unified Approach to LPC and Cepstral Method," Proc.

ICSLP-90, pp. 37-40, 1990.

[10] Kazuhito Koishida, Keiichi Tokuda, Takao Kobayashi, and Satoshi Imai, "Spectral Representation of Speech Based on Mel-Generalized Cepstral

Coefficients and Its Properties," Electronics and Communications in Japan, Part

3, vol. 83, no. 3, pp. 50-59, Mar. 2000.

[11] Kazuhito Koishida, Keiichi Tokuda, Takao Kobayashi, and Satoshi Imai, "CELP Coding System Based on Mel-Generalized Cepstral Analysis," Spoken

184 vol. 1, pp. 318-321, Oct. 1996.

[12] Kazuhito Koishida, Keiichi Tokuda, Takao Kobayashi, and Satoshi Imai,

"Efficient Encoding of Mel-Generalized Cepstrum for CELP Coders," Acoustics,

Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on , vol. 2, pp. 1355-1358, Apr. 1997.

[13] Kazuhito Koishida, Gou HIRABAYASHI, Keiichi Tokuda, and Takao Kobayashi, "A 16 kb/s Wideband CELP-Based Speech Coder Using Mel-

Generalized Cepstral Analysis," IEICE Trans. Inf. & Syst., vol. E83-D, no. 4, pp. 876-883, Apr. 2000.

[14] Heiga Zen, Tomoki Toda, and Keiichi Tokuda, "The Nitech-NAIST HMM- Based Speech Synthesis System for the Blizzard Challenge 2006," IEICE

TRANS. INF. & SYST., VOL.E91–D, NO.6 JUNE, vol. E91–D, no. 6, pp. 1764-

1773, June 2008.

[15] T. KOBAYASHI and S. IMAI, "Spectral Analysis Using Generalized

Cepstrum," IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL

PROCESSING, vol. ASSP-3, no. 5, pp. 1087-1089, Aug. 1984.

[16] Jae S. Lim, "Spectral Root Homomorphic Deconvolution System," IEEE

Tansactions on Acoustic, Speech and Signal Processing, vol. AASP2-7, no. 3,

pp. 223-233, Junho 1979.

[17] A. V. Oppenheim and R. W. Shafer, Digital Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1975, ch. 10.

[18] Satoshi Imai, "Adaptive mel cepstral analysis based on UELS method," in

Adaptive Systems for Signal Processing, Communications, and Control Symposium 2000. AS-SPCC. The IEEE 2000, vol. AS-SPCC. The IEEE 2000,

Lake Louise, Alta. , Canada., 2000, pp. 304 - 309.

[19] Murray R. Spiegel, Schaum´s Outlines: Complex Variables With an Introduction

to Conformal Mapping and Its Applications, 1st ed. USA: McGraw-Hill, 1964.

[20] Takao Kobayashi and Satoshi Imai, "Speech Synthesis Using Generalized Cepstrum," Electronics and Communications in Japan, vol. 65-A, no. 3, pp. 28- 36, 1982.

[21] A. V. Oppenheim and D. H. Johnson, "Discrete Representation of signals,"

Proceedings of the IEEE, vol. 60, no. 6, pp. 681-691, June 1972.

[22] Keiichi Tokuda, Takao Kobayashi, and Satoshi Imai. (1994, Apr.) Recursive Calculation of Mel-Cepstrum from LP Coefficients. [Online].

http://www.sp.nitech.ac.jp/~tokuda/tips/mgceptr_sa2.pdf

[23] K. Tokuda, T. Kobayashi, and S. Imai, "Recursion Formula for Calculation of Mel Generalized Cepstrum Coefficients," Trans. IEICE, vol. J71-A, pp. 128– 131, Jan. 1988. (em Japonês).

185

[24] F. Itakura and S. Saito, "Speech Information Compression based on the

Maximum Likelihood Spectral Estimation," J. Acous. Soc. Japan, vol. 27, no. 9, pp. 463-472, Sep. 1971.

[25] M. Pinsker, Information and Information Stability of Random Variables. San Francisco: Holden-Day, 1963.

[26] I. Gohberg and I. Koltracht, "Efficient algorithm for Toeplitz plus Hankel matrices," Integral Equations and Operator Theory, vol. 12, pp. 136-142, 1989. [27] A. E. Yagle, "New analogs of split algorithms for arbitrary Toeplitz-plus-Hankel

matrices," IEEE Trans. Acoust., Speech & Signal Process, vol. ASSP-39, no. 11, pp. 2457-2463, Nov. 1991.

[28] Ramiro R. A. Barreira and Fábio Violaro, "Avaliação da Análise Cepstral Generalizada Aplicada à Modelagem de Vogais," in Anais do 7o Congresso da

AES Brasil 2009, São Paulo, 2009, pp. 95-101.

[29] Ramiro R. A. Barreira and Fábio Violaro, "Uma Avaliação da Análise Cepstral Generalizada Aplicada ao Espectro de Vogais," in Anais do XXVII Simpósio

Brasileiro de Telecomunicações, 2009, Blumenau, 2009.

[30] Satoshi Imai, Kazuo Sumita, and Chieko Furuichi, "Mel Log Spectrum Approximation (MLSA) Filter for Speech Synthesis," Electronics and

Communications in Japan, vol. 66-A, no. 2, pp. 10-18, 1983.

[31] T. Kobayashi, S. Imai, and Y. Fukuda, "Mel generalized-log spectrum

approximation (MGLSA) filter," Journal of IEICE (Japanese Edition), vol. J68- A, no. 6, pp. 610–611, 1985.

[32] Heiga Zen, Keiichi Tokuda, and Alan W. Black, "Statistical Parametric Speech Synthesis," Speech Communication, vol. 51, no. 11, pp. 1039-1064, Nov. 2009. [33] Hideki Kawahara, Ikuyo Masuda-Katsuse, and Alain de Cheveigné,

"Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds," Speech Communication, vol. 27, pp. 187-207, 1999.

[34] Heiga Zen, Tomoki Toda, Masaru Nakamura, and Keiichi Tokuda, "Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005," IEICE Trans, Inf. & Sust., vol. E90–D, no. 1, pp. 325-333, Jan. 2007. [35] Hideki Kawahara, Jo Estill, and Osamu Fujimura, "Aperiodicity extraction and

control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT," in 2nd

MAVEBA, Firenze, Italy, 2001.

[36] R. Maia, T. Toda, H. Zen, Y. Nankaku, and K. Tokuda, "An excitation model for HMM-based speech synthesis based on residual modeling," in Proc. ISCA SSW6,

186 2007, pp. 131-136.

[37] Thomas Drugman, Geoffrey Wilfart, and Thierry Dutoit, "A Deterministic plus Stochastic Model of the Residual Signal for Improved Parametric Speech Synthesis," in Proc. Interspeech 2009, Brighton, UK, 2009.

[38] Thomas Drugman, Geoffrey Wilfart, and Thierry Dutoit, "EIGENRESIDUALS FOR IMPROVED PARAMETRIC SPEECH SYNTHESIS," in 17th European

Signal Processing Conference (EUSIPCO 2009), Glasgow, Scotland, 2009, pp.

2176-2180.

[39] T. Toda and K. Tokuda, "A speech parameter generation algorithm considering global variance for HMM-based speech synthesis," IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 816–824, 2007.

[40] Chiyomi Miyajima, Hideyuki Watanabe, Keiichi Tokuda, Tadashi Kitamura, and Shigeru Katagiri, "A new approach to designing a feature extractor in speaker identification based on discrimanative feature extraction," Speech

No documento Modelo mel-cepstral generalizado para envoltória espectral de fala (páginas 191-196)