A very low bit rate speech coder using HMM-based speech recognition synthesis techniques

被引:0
|
作者
Tokuda, K [1 ]
Masuko, T [1 ]
Hiroi, J [1 ]
Kobayashi, T [1 ]
Kitamura, T [1 ]
机构
[1] Nagoya Inst Technol, Dept Comp Sci, Nagoya, Aichi 466, Japan
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a very low bit rate speech coder based on HMM (Hidden Markov Model). The encoder carries out phoneme recognition, and transmits phoneme indexes, state durations and pitch information to the decoder. In the decoder, phoneme HMMs are concatenated according to the phoneme indexes, and a sequence of mel-cepstral coefficient vectors is generated from the concatenated HMM by using an ML-based speech parameter generation technique. Finally we obtain synthetic speech by exciting the MLSA (Mel Log Spectrum Approximation) filter, whose coefficients are given by mel-cepstral coefficients, according to the pitch information. A subjective listening test shows that the performance of the proposed coder at about 150 bit/s (for the test data including 26% silence region) is comparable to a VQ-based vocoder at 400 bit/s (= 8 bit/frame x 50 frame/s) without pitch quantization for both coders.
引用
收藏
页码:609 / 612
页数:4
相关论文
共 50 条
  • [21] A Very Low-Bit-Rate Analysis-by-Synthesis Speech Coder Using Zinc Function Excitation
    Seo, Sang Won
    Kim, Jong Hak
    Lee, Chang Hwan
    Jeong, Gyu-Hyeok
    Lee, In Sung
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2006, 25 (06): : 282 - 290
  • [22] Diphone-like units for very low bit rate speech coder
    Motlicek, P
    Cernocky, J
    Baudoin, G
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 4023 - 4023
  • [23] Speech parameter generation algorithms for HMM-based speech synthesis
    Tokuda, K
    Yoshimura, T
    Masuko, T
    Kobayashi, T
    Kitamura, T
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1315 - 1318
  • [24] HMM-Based Speech Synthesis for the Greek Language
    Karabetsos, Sotiris
    Tsiakoulis, Pirros
    Chalamandaris, Aimilios
    Raptis, Spyros
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2008, 5246 : 349 - 356
  • [25] A BAYESIAN APPROACH TO HMM-BASED SPEECH SYNTHESIS
    Hashimoto, Kei
    Zen, Heiga
    Nankaku, Yoshihiko
    Masuko, Takashi
    Tokuda, Keiichi
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4029 - +
  • [26] An HMM-based Vietnamese Speech Synthesis System
    Vu, Thang Tat
    Luong, Mai Chi
    Nakamura, Satoshi
    ORIENTAL COCOSDA 2009 - INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2009, : 116 - +
  • [27] Techniques of very low bit-rate speech coding
    Cui, HJ
    Tang, K
    Zhao, M
    Zhang, X
    CHINESE JOURNAL OF ELECTRONICS, 2004, 13 (01): : 63 - 65
  • [28] An HMM-based Cantonese Speech Synthesis System
    Wang, Xin
    Wu, Zhiyong
    2012 IEEE GLOBAL HIGH TECH CONGRESS ON ELECTRONICS (GHTCE), 2012,
  • [29] Unsupervised adaptation for HMM-based speech synthesis
    King, Simon
    Tokuda, Keiichi
    Zen, Heiga
    Yamagishi, Junichi
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1869 - +
  • [30] An acoustic model adaptation using hmm-based speech synthesis
    Tanaka, K
    Kuroiwa, S
    Tsuge, S
    Ren, F
    2003 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, PROCEEDINGS, 2003, : 368 - 373