Speaking Rate Normalization with Lattice-based Context-dependent Phoneme Duration Modeling for Personalized Speech Recognizers on Mobile Devices

被引：0

作者：

Yeh, Ching-Feng ^{[1
]}

Lee, Hung-Yi ^{[2
]}

Lee, Lin-Shan ^{[1
]}

机构：

[1] Natl Taipei Univ, Grad Inst Commun Engn, New Taipei, Taiwan

[2] Acad Sinica, Res Ctr Informat Technol Innovat, Taipei, Taiwan

来源：

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年

关键词：

speaking rate; mobile; speech recognition;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Voice access of cloud applications including social networks using mobile devices becomes attractive today. And personalized speech recognizers over mobile devices become feasible because most mobile devices have only a single user. Speaking rate variation is known to be an important source of performance degradation for spontaneous speech recognition. Speaking rate is speaker dependent, it changes from time to time for every speaker. Furthermore, the speaking rate variation pattern is unique for each speaker. An approach of continuous frame rate normalization (CFRN) [1] was recently proposed to take care of the speaking rate variation problem. In this paper, we further proposed an extended version of CFRN for personalized speech recognizers on mobile platforms. In this approach, we use context-dependent phoneme duration models adapted to each speaker to estimate the speaking rate utterance by utterance based on lattices obtained with a first pass recognizer. The proposed approach was evaluated on both read speech and spontaneous recordings from mobile platforms and significant improvement were observed in the experimental result.

引用

页码：1740 / 1744

页数：5

共 8 条

[1] Context-dependent phoneme duration modeling with tree-based state tying
Park, SJ
Koo, MW
Jhon, CS
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (03): : 662 - 666
[2] Performance of connected digit recognizers with context-dependent word duration modeling
Kwon, OW
Un, CK
APCCAS '96 - IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS '96, 1996, : 243 - 246
[3] MDL-based context-dependent subword modeling for speech recognition
Shinoda, Koichi
Watanabe, Takao
Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi), 2000, 21 (02): : 79 - 86
[4] A frame-based context-dependent acoustic modeling for speech recognition
Terashima R.
Zen H.
Nankaku Y.
Tokuda K.
IEEJ Transactions on Electronics, Information and Systems, 2010, 130 (10) : 1856 - 1864+24
[5] Context-dependent modeling and speaker normalization applied to reservoir-based phone recognition
Triefenbach, Fabian
Jalalvand, Azarakhsh
Demuynck, Kris
Martens, Jean-Pierre
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3341 - 3345
[6] Regression-Based Context-Dependent Modeling of Deep Neural Networks for Speech Recognition
Wang, Guangsen
Sim, Khe Chai
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (11) : 1660 - 1669
[7] Context-dependent acoustic modeling based on hidden maximum entropy model for statistical parametric speech synthesis
Khorram, Soheil
Sameti, Hossein
Bahmaninezhad, Fahimeh
King, Simon
Drugman, Thomas
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2014,
[8] Context-dependent acoustic modeling based on hidden maximum entropy model for statistical parametric speech synthesis
Soheil Khorram
Hossein Sameti
Fahimeh Bahmaninezhad
Simon King
Thomas Drugman
EURASIP Journal on Audio, Speech, and Music Processing, 2014

← 1 →