Speech recognition using voice-characteristic-dependent acoustic models

被引：0

作者：

Suzuki, H ^{[1
]}

Zen, H ^{[1
]}

Nankaku, Y ^{[1
]}

Miyajima, C ^{[1
]}

Tokuda, K ^{[1
]}

Kitamura, T ^{[1
]}

机构：

[1] Nagoya Inst Technol, Dept Comp Sci, Nagoya, Aichi 4668555, Japan

来源：

2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I | 2003年

关键词：

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper proposes a speech recognition technique based on acoustic models considering voice characteristic variations. Context-dependent acoustic models, which are typically triphone HMMs, are often used in continuous speech recognition systems. This work hypothesizes that the speaker voice characteristics that humans can perceive by listening are also factors in acoustic variation for construction of acoustic models, and a tree-based clustering technique is also applied to speaker voice characteristics to construct voice-characteristic-dependent acoustic models. In speech recognition using triphone models, the neighboring phonetic context is given from the linguistic-phonetic knowledge. in advance; in contrast, the voice characteristics of input speech are unknown in recognition using voice-characteristic-dependent acoustic models. This paper proposes a method of recognizing speech even under conditions where the voice characteristics of the input speech are unknown. The result of a gender-dependent speech recognition experiment shows that the proposed method achieves higher recognition performance in comparison to conventional methods.

引用

页码：740 / 743

页数：4

共 50 条

[21] Speech enhancement using voice source models
Yasmin, A
Fieguth, P
Deng, L
ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 797 - 800
[22] Context-dependent acoustic modeling using graphemes for large vocabulary speech recognition
Kanthak, S
Ney, H
2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 845 - 848
[23] Detecting fatigue from voice using speech recognition
Greeley, H. P.
Friets, E.
Wilson, J. P.
Raghavan, S.
Picone, J.
Berg, J.
2006 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY, VOLS 1 AND 2, 2006, : 567 - 571
[24] EMOTION RECOGNITION USING VOICE CHARACTERISTICS IN SPEECH RECORDINGS
Randrianavony, Koloina
Marechal, Catherine
Conteville, Laurie
Bougueroua, Lamine
13TH INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND MANAGEMENT, ICICM 2023, 2023, : 26 - 32
[25] Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models
Jorge, Javier
Gimenez, Adria
Silvestre-Cerda, Joan Albert
Civera, Jorge
Sanchis, Albert
Juan, Alfons
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 148 - 161
[26] Optimizing acoustic models for commercial speech recognition using foreground scores and data weighting
Boies, D
Strope, B
Weintraub, M
Wu, SL
2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 817 - 820
[27] Acoustic model adaptation using in-domain background models for dysarthric speech recognition
Sharma, Harsh Vardhan
Hasegawa-Johnson, Mark
COMPUTER SPEECH AND LANGUAGE, 2013, 27 (06): : 1147 - 1162
[28] Improving Acoustic Models for Dysarthric Speech Recognition using Time Delay Neural Networks
Misbullah, Alim
Lin, Hai-Hsing
Chang, Chia-Yuan
Yeh, Hsiu-Wei
Weng, Ko-Cheng
2020 INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS (ICELTICS 2020), 2020, : 118 - 121
[29] Simultaneous Adaptation of Acoustic and Language Models for Emotional Speech Recognition Using Tweet Data
Kosaka, Tetsuo
Saeki, Kazuya
Aizawa, Yoshitaka
Kato, Masaharu
Nose, Takashi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (03) : 363 - 373
[30] Speaker-dependent speech recognition based on phone-like units models - Application to voice dialing
Fontaine, V
Bourlard, H
1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1527 - 1530

← 1 2 3 4 5 →