Speech recognition using voice-characteristic-dependent acoustic models

被引:0
|
作者
Suzuki, H [1 ]
Zen, H [1 ]
Nankaku, Y [1 ]
Miyajima, C [1 ]
Tokuda, K [1 ]
Kitamura, T [1 ]
机构
[1] Nagoya Inst Technol, Dept Comp Sci, Nagoya, Aichi 4668555, Japan
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes a speech recognition technique based on acoustic models considering voice characteristic variations. Context-dependent acoustic models, which are typically triphone HMMs, are often used in continuous speech recognition systems. This work hypothesizes that the speaker voice characteristics that humans can perceive by listening are also factors in acoustic variation for construction of acoustic models, and a tree-based clustering technique is also applied to speaker voice characteristics to construct voice-characteristic-dependent acoustic models. In speech recognition using triphone models, the neighboring phonetic context is given from the linguistic-phonetic knowledge. in advance; in contrast, the voice characteristics of input speech are unknown in recognition using voice-characteristic-dependent acoustic models. This paper proposes a method of recognizing speech even under conditions where the voice characteristics of the input speech are unknown. The result of a gender-dependent speech recognition experiment shows that the proposed method achieves higher recognition performance in comparison to conventional methods.
引用
收藏
页码:740 / 743
页数:4
相关论文
共 50 条
  • [1] Context-dependent acoustic models for Chinese speech recognition
    Ma, B
    Huang, TY
    Xu, B
    Zhang, XJ
    Qu, F
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 455 - 458
  • [2] CONTEXT DEPENDENT STATE TYING FOR SPEECH RECOGNITION USING DEEP NEURAL NETWORK ACOUSTIC MODELS
    Bacchiani, Michiel
    Rybach, David
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [3] Continuous speech recognition based on general factor dependent acoustic models
    Suzuki, H
    Zen, H
    Nankaku, Y
    Miyajima, C
    Tokuda, K
    Kitamura, T
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (03): : 410 - 417
  • [4] Interpolation of Acoustic Models for Speech Recognition
    Fraga-Silva, Thiago
    Gauvain, Jean-Luc
    Lamel, Lori
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3346 - 3350
  • [5] Boosting Thai Syllable Speech Recognition Using Acoustic Models Combination
    Tangwongsan, Supachai
    Phoophuangpairoj, Rong
    ICCEE 2008: PROCEEDINGS OF THE 2008 INTERNATIONAL CONFERENCE ON COMPUTER AND ELECTRICAL ENGINEERING, 2008, : 568 - 572
  • [6] Emotional Speech Recognition Using Acoustic Models of Decomposed Component Words
    Kaveeta, Vivatchai
    Patanukhom, Karn
    2013 SECOND IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR 2013), 2013, : 115 - 119
  • [7] Multilingual acoustic models for speech recognition and synthesis
    Kunzmann, S
    Fischer, V
    Gonzalez, J
    Emam, O
    Günther, C
    Janke, E
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE AND MULTIDIMENSIONAL SIGNAL PROCESSING SPECIAL SESSIONS, 2004, : 745 - 748
  • [8] Dynamically configurable acoustic models for speech recognition
    Hwang, MY
    Huang, XD
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 669 - 672
  • [9] Compact Acoustic Models for Embedded Speech Recognition
    Levy, Christophe
    Linares, Georges
    Bonastre, Jean-Francois
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2009,
  • [10] Acoustic-to-Phrase Models for Speech Recognition
    Gaur, Yashesh
    Li, Jinyu
    Meng, Zhong
    Gong, Yifan
    INTERSPEECH 2019, 2019, : 2240 - 2244