Efficient Acoustic Modeling Method for Unsupervised Speech Recognition using Multi-Task Deep Neural Network

被引:0
|
作者
Yao Haitao [1 ]
An Maobo [2 ]
Xu Ji [1 ]
Liu Jian [1 ]
机构
[1] Chinese Acad Sci, Institute Acoust, 21 North 4th Ring Rd, Beijing 100190, Peoples R China
[2] Coordinat Ctr China, Natl Comp network Emergency Response Tech Team, Beijing 100029, Peoples R China
来源
PROCEEDINGS OF THE 2015 4TH NATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS AND COMPUTER ENGINEERING ( NCEECE 2015) | 2016年 / 47卷
关键词
Speech Recognition; Acoustic Modeling; Unsupervised Training; Multi-Lingual; Multi-Task Deep Neural Network;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a method of acoustic modeling for zero-resourced languages speech recognition under mismatch conditions. In those languages, very limited or no transcribed speech is available for traditional monolingual speech recognition. Conventional methods such as IPA based universal acoustic modeling has been proved to be effective under matched acoustic conditions (similar speaking styles, adjacent languages, etc.), while usually poorly preformed when mismatch occurs. Since mismatch problems between languages often appears, in this paper, unsupervised acoustic modeling via cross-lingual knowledge sharing has thus been proposed: first, initial acoustic models (AM) for a target zero-resourced language are trained using Multi-Task Deep Neural Networks (MDNN)-different languages' speech mapped to the phonemes of the target language (mapped data) is jointly trained together with the same data transcribed language specifically and respectively (specific data); then, automatically transcribed target language data is used in the iterative process to train new AMs, with various auxiliary tasks. Experiment on 100 hour Japanese speech without transcripts achieved a character error rate (CER) of 57.21%, 19.32% absolute improvement compared to baseline (IPA based universal acoustic modeling).
引用
收藏
页码:365 / 370
页数:6
相关论文
共 50 条
  • [1] Multi-Lingual Unsupervised Acoustic Modeling Using Multi-Task Deep Neural Network under Mismatch Conditions
    Yao Haitao
    Xu Ji
    Liu Jian
    PROCEEDINGS OF 2016 8TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION SOFTWARE AND NETWORKS (ICCSN 2016), 2016, : 139 - 144
  • [2] Discretized Continuous Speech Emotion Recognition with Multi-Task Deep Recurrent Neural Network
    Duc Le
    Aldeneh, Zakaria
    Provost, Emily Mower
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1108 - 1112
  • [3] IMPROVING SPEECH RECOGNITION IN REVERBERATION USING A ROOM-AWARE DEEP NEURAL NETWORK AND MULTI-TASK LEARNING
    Giri, Ritwik
    Seltzer, Michael L.
    Droppo, Jasha
    Yu, Dong
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5014 - 5018
  • [4] Speech Emotion Recognition Based on Multi-Task Learning Using a Convolutional Neural Network
    Kim, Nam Kyun
    Lee, Jiwon
    Ha, Hun Kyu
    Lee, Geon Woo
    Lee, Jung Hyuk
    Kim, Hong Kook
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 704 - 707
  • [5] JOINT ACOUSTIC MODELING OF TRIPHONES AND TRIGRAPHEMES BY MULTI-TASK LEARNING DEEP NEURAL NETWORKS FOR LOW-RESOURCE SPEECH RECOGNITION
    Chen, Dongpeng
    Mak, Brian
    Leung, Cheung-Chi
    Sivadas, Sunil
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [6] MULTI-TASK DEEP NEURAL NETWORK ACOUSTIC MODELS WITH MODEL ADAPTATION USING DISCRIMINATIVE SPEAKER IDENTITY FOR WHISPER RECOGNITION
    Li, Jingjie
    McLoughlin, Ian
    Liu, Cong
    Xue, Shaofei
    Wei, Si
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4969 - 4973
  • [7] Adversarial Multi-task Learning of Deep Neural Networks for Robust Speech Recognition
    Shinohara, Yusuke
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2369 - 2372
  • [8] MULTI-TASK JOINT-LEARNING OF DEEP NEURAL NETWORKS FOR ROBUST SPEECH RECOGNITION
    Qian, Yanmin
    Yin, Maofan
    You, Yongbin
    Yu, Kai
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 310 - 316
  • [9] Deep Convolutional Neural Network with Multi-Task Learning Scheme for Modulations Recognition
    Mossad, Omar S.
    ElNainay, Mustafa
    Torki, Marwan
    2019 15TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE (IWCMC), 2019, : 1644 - 1649
  • [10] Traffic Sign Recognition Using a Multi-Task Convolutional Neural Network
    Luo, Hengliang
    Yang, Yi
    Tong, Bei
    Wu, Fuchao
    Fan, Bin
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2018, 19 (04) : 1100 - 1111