A very low bit rate speech coder using HMM-based speech recognition synthesis techniques

被引:0
|
作者
Tokuda, K [1 ]
Masuko, T [1 ]
Hiroi, J [1 ]
Kobayashi, T [1 ]
Kitamura, T [1 ]
机构
[1] Nagoya Inst Technol, Dept Comp Sci, Nagoya, Aichi 466, Japan
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a very low bit rate speech coder based on HMM (Hidden Markov Model). The encoder carries out phoneme recognition, and transmits phoneme indexes, state durations and pitch information to the decoder. In the decoder, phoneme HMMs are concatenated according to the phoneme indexes, and a sequence of mel-cepstral coefficient vectors is generated from the concatenated HMM by using an ML-based speech parameter generation technique. Finally we obtain synthetic speech by exciting the MLSA (Mel Log Spectrum Approximation) filter, whose coefficients are given by mel-cepstral coefficients, according to the pitch information. A subjective listening test shows that the performance of the proposed coder at about 150 bit/s (for the test data including 26% silence region) is comparable to a VQ-based vocoder at 400 bit/s (= 8 bit/frame x 50 frame/s) without pitch quantization for both coders.
引用
收藏
页码:609 / 612
页数:4
相关论文
共 50 条
  • [31] Thousands of Voices for HMM-based Speech Synthesis
    Yamagishi, Junichi
    Usabaev, Bela
    King, Simon
    Watts, Oliver
    Dines, John
    Tian, Jilei
    Hu, Rile
    Guan, Yong
    Oura, Keiichiro
    Tokuda, Keiichi
    Karhila, Reima
    Kurimo, Mikko
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 416 - +
  • [32] Use of voicing features in HMM-based speech recognition
    Thomson, DL
    Chengalvarayan, R
    SPEECH COMMUNICATION, 2002, 37 (3-4) : 197 - 211
  • [33] Modified Viterbi Scoring for HMM-Based Speech Recognition
    Jo, Jihyuck
    Kim, Han-Gyu
    Park, In-Cheol
    Jung, Bang Chul
    Yoo, Hoyoung
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2019, 25 (02): : 351 - 358
  • [34] Normalized training for HMM-based visual speech recognition
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    Kitamura, Tadashi
    Kobayashi, Takao
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 2006, 89 (11): : 40 - 50
  • [35] Using HMM-based Speech Synthesis to Reconstruct the Voice of Individuals with Degenerative Speech Disorders
    Veaux, Christophe
    Yamagishi, Junichi
    King, Simon
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 966 - 969
  • [36] Analysis of HMM-Based Lombard Speech Synthesis
    Raitio, Tuomo
    Suni, Antti
    Vainio, Martti
    Alku, Paavo
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2792 - +
  • [37] HMM-based Speech Recognition Using Decision Trees Instead of GMMs
    Teunen, Remco
    Akamine, Masami
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 617 - 620
  • [38] A speech parameter generation algorithm using local variance for HMM-based speech synthesis
    Chunwijitra, Vataya
    Nose, Takashi
    Kobayashi, Takao
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1150 - 1153
  • [39] Synthesis of stressed speech from isolated neutral speech using HMM-based models
    BouGhazale, SE
    Hansen, JHL
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1860 - 1863
  • [40] Simplified scoring methods for HMM-based speech recognition
    Paramonov, Pavel
    Sutula, Nadezhda
    SOFT COMPUTING, 2016, 20 (09) : 3455 - 3460