Phone set generation based on acoustic and contextual analysis for multilingual speech recognition

被引:0
|
作者
Huang, Chien-Lin [1 ]
Wu, Chung-Hsien [1 ]
机构
[1] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan 70101, Taiwan
关键词
multilingual speech recognition; confusion matrix; acoustic likelihood; hyperspace analog to language model;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This study presents a novel approach to generating phone units generation for the recognition of multilingual speech. Acoustic and contextual analysis is performed to characterize multilingual phonetic units for phone set generation. A confusion matrix combining acoustic and contextual similarities between every two phonetic units is constructed for phonetic unit clustering. Acoustic likelihood and hyperspace analog to language (HAL) model are adopted for acoustic similarity and contextual similarity estimation of phone models, respectively. Experiments show that the generated phone set provides a compact and robust set that considers acoustic and contextual information for multilingual speech recognition.
引用
收藏
页码:1017 / +
页数:2
相关论文
共 50 条
  • [21] Speech Activity Detection Based on Multilingual Speech Recognition System
    Sarfjoo, Seyyed Saeed
    Madikeri, Srikanth
    Motlicek, Petr
    INTERSPEECH 2021, 2021, : 4369 - 4373
  • [22] Acoustic Phonetic Decoding Oriented to Multilingual Speech Recognition in the Basque Context
    Barroso, N.
    Lopez de Ipina, K.
    Ezeiza, A.
    TRENDS IN PRACTICAL APPLICATIONS OF AGENTS AND MULTIAGENT SYSTEMS, 2010, 71 : 697 - +
  • [23] Multilingual acoustic models for speech recognition in low-resource devices
    Garcia, Enrique Gil
    Mengusoglu, Erhan
    Janke, Eric
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 981 - +
  • [24] Xenophones:: An investigation of phone set expansion in Swedish and implications for speech recognition and speech synthesis
    Eklund, R
    Lindström, A
    SPEECH COMMUNICATION, 2001, 35 (1-2) : 81 - 102
  • [25] CTC Training of Multi-Phone Acoustic Models for Speech Recognition
    Siohan, Olivier
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 709 - 713
  • [26] Emotional feature analysis and recognition in multilingual speech signal
    School of Information Science and Engineering, University of Jinan, Jinan 250022, China
    ICEMI - Proc. Int. Conf. Electron. Meas. Instrum., 1600, (41046-41050):
  • [27] Indian languages ASR: A multilingual phone recognition framework with IPA based common phone-set, predicted articulatory features and feature fusion
    Manjunath, K. E.
    Rao, K. Sreenivasa
    Jayagopi, Dinesh Babu
    Ramasubramanian, V.
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1016 - 1020
  • [28] Acoustic Model Merging Using Acoustic Models from Multilingual Speakers for Automatic Speech Recognition
    Tan, Tien-Ping
    Besacier, Laurent
    Lecouteux, Benjamin
    PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2014), 2014, : 42 - 45
  • [29] A multilingual phoneme and model set: Toward a universal base for automatic speech recognition
    Gokeen, S
    Gokeen, J
    1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS, 1997, : 599 - 605
  • [30] Acoustic analysis and recognition of whispered speech
    Itoh, T
    Takeda, K
    Itakura, F
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 389 - 392