Speech recognition using voice-characteristic-dependent acoustic models

被引:0
|
作者
Suzuki, H [1 ]
Zen, H [1 ]
Nankaku, Y [1 ]
Miyajima, C [1 ]
Tokuda, K [1 ]
Kitamura, T [1 ]
机构
[1] Nagoya Inst Technol, Dept Comp Sci, Nagoya, Aichi 4668555, Japan
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes a speech recognition technique based on acoustic models considering voice characteristic variations. Context-dependent acoustic models, which are typically triphone HMMs, are often used in continuous speech recognition systems. This work hypothesizes that the speaker voice characteristics that humans can perceive by listening are also factors in acoustic variation for construction of acoustic models, and a tree-based clustering technique is also applied to speaker voice characteristics to construct voice-characteristic-dependent acoustic models. In speech recognition using triphone models, the neighboring phonetic context is given from the linguistic-phonetic knowledge. in advance; in contrast, the voice characteristics of input speech are unknown in recognition using voice-characteristic-dependent acoustic models. This paper proposes a method of recognizing speech even under conditions where the voice characteristics of the input speech are unknown. The result of a gender-dependent speech recognition experiment shows that the proposed method achieves higher recognition performance in comparison to conventional methods.
引用
收藏
页码:740 / 743
页数:4
相关论文
共 50 条
  • [21] Speech enhancement using voice source models
    Yasmin, A
    Fieguth, P
    Deng, L
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 797 - 800
  • [22] Context-dependent acoustic modeling using graphemes for large vocabulary speech recognition
    Kanthak, S
    Ney, H
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 845 - 848
  • [23] Detecting fatigue from voice using speech recognition
    Greeley, H. P.
    Friets, E.
    Wilson, J. P.
    Raghavan, S.
    Picone, J.
    Berg, J.
    2006 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY, VOLS 1 AND 2, 2006, : 567 - 571
  • [24] EMOTION RECOGNITION USING VOICE CHARACTERISTICS IN SPEECH RECORDINGS
    Randrianavony, Koloina
    Marechal, Catherine
    Conteville, Laurie
    Bougueroua, Lamine
    13TH INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND MANAGEMENT, ICICM 2023, 2023, : 26 - 32
  • [25] Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models
    Jorge, Javier
    Gimenez, Adria
    Silvestre-Cerda, Joan Albert
    Civera, Jorge
    Sanchis, Albert
    Juan, Alfons
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 148 - 161
  • [26] Optimizing acoustic models for commercial speech recognition using foreground scores and data weighting
    Boies, D
    Strope, B
    Weintraub, M
    Wu, SL
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 817 - 820
  • [27] Acoustic model adaptation using in-domain background models for dysarthric speech recognition
    Sharma, Harsh Vardhan
    Hasegawa-Johnson, Mark
    COMPUTER SPEECH AND LANGUAGE, 2013, 27 (06): : 1147 - 1162
  • [28] Improving Acoustic Models for Dysarthric Speech Recognition using Time Delay Neural Networks
    Misbullah, Alim
    Lin, Hai-Hsing
    Chang, Chia-Yuan
    Yeh, Hsiu-Wei
    Weng, Ko-Cheng
    2020 INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS (ICELTICS 2020), 2020, : 118 - 121
  • [29] Simultaneous Adaptation of Acoustic and Language Models for Emotional Speech Recognition Using Tweet Data
    Kosaka, Tetsuo
    Saeki, Kazuya
    Aizawa, Yoshitaka
    Kato, Masaharu
    Nose, Takashi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (03) : 363 - 373
  • [30] Speaker-dependent speech recognition based on phone-like units models - Application to voice dialing
    Fontaine, V
    Bourlard, H
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1527 - 1530