Phone set generation based on acoustic and contextual analysis for multilingual speech recognition

被引：0

作者：

Huang, Chien-Lin ^{[1
]}

Wu, Chung-Hsien ^{[1
]}

机构：

[1] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan 70101, Taiwan

来源：

2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3 | 2007年

关键词：

multilingual speech recognition; confusion matrix; acoustic likelihood; hyperspace analog to language model;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This study presents a novel approach to generating phone units generation for the recognition of multilingual speech. Acoustic and contextual analysis is performed to characterize multilingual phonetic units for phone set generation. A confusion matrix combining acoustic and contextual similarities between every two phonetic units is constructed for phonetic unit clustering. Acoustic likelihood and hyperspace analog to language (HAL) model are adopted for acoustic similarity and contextual similarity estimation of phone models, respectively. Experiments show that the generated phone set provides a compact and robust set that considers acoustic and contextual information for multilingual speech recognition.

引用

页码：1017 / +

页数：2

共 50 条

[21] Speech Activity Detection Based on Multilingual Speech Recognition System
Sarfjoo, Seyyed Saeed
Madikeri, Srikanth
Motlicek, Petr
INTERSPEECH 2021, 2021, : 4369 - 4373
[22] Acoustic Phonetic Decoding Oriented to Multilingual Speech Recognition in the Basque Context
Barroso, N.
Lopez de Ipina, K.
Ezeiza, A.
TRENDS IN PRACTICAL APPLICATIONS OF AGENTS AND MULTIAGENT SYSTEMS, 2010, 71 : 697 - +
[23] Multilingual acoustic models for speech recognition in low-resource devices
Garcia, Enrique Gil
Mengusoglu, Erhan
Janke, Eric
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 981 - +
[24] Xenophones:: An investigation of phone set expansion in Swedish and implications for speech recognition and speech synthesis
Eklund, R
Lindström, A
SPEECH COMMUNICATION, 2001, 35 (1-2) : 81 - 102
[25] CTC Training of Multi-Phone Acoustic Models for Speech Recognition
Siohan, Olivier
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 709 - 713
[26] Emotional feature analysis and recognition in multilingual speech signal
School of Information Science and Engineering, University of Jinan, Jinan 250022, China
ICEMI - Proc. Int. Conf. Electron. Meas. Instrum., 1600, (41046-41050):
[27] Indian languages ASR: A multilingual phone recognition framework with IPA based common phone-set, predicted articulatory features and feature fusion
Manjunath, K. E.
Rao, K. Sreenivasa
Jayagopi, Dinesh Babu
Ramasubramanian, V.
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1016 - 1020
[28] Acoustic Model Merging Using Acoustic Models from Multilingual Speakers for Automatic Speech Recognition
Tan, Tien-Ping
Besacier, Laurent
Lecouteux, Benjamin
PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2014), 2014, : 42 - 45
[29] A multilingual phoneme and model set: Toward a universal base for automatic speech recognition
Gokeen, S
Gokeen, J
1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS, 1997, : 599 - 605
[30] Acoustic analysis and recognition of whispered speech
Itoh, T
Takeda, K
Itakura, F
2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 389 - 392

← 1 2 3 4 5 →