Speech Recognition for Air Traffic Control via Feature Learning and End-to-End Training

被引:3
|
作者
Fan, Peng [1 ]
Hua, Xiyao [2 ]
Lin, Yi [2 ]
Yang, Bo [2 ]
Zhang, Jianwei [2 ]
Ge, Wenyi [3 ]
Guo, Dongyue [1 ]
机构
[1] Sichuan Univ, Natl Key Lab Fundamental Sci Synthet Vis, Chengdu, Peoples R China
[2] Sichuan Univ, Coll Comp Sci, Chengdu, Peoples R China
[3] Chengdu Univ Informat Technol, Coll Comp Sci, Chengdu, Peoples R China
基金
中国国家自然科学基金;
关键词
  automatic speech recognition; feature learning; air traffic con-trol; multilingual; end-to-end training;
D O I
10.1587/transinf.2022EDP7151
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this work, we propose a new automatic speech recog-nition (ASR) system based on feature learning and an end-to-end training procedure for air traffic control (ATC) systems. The proposed model inte-grates the feature learning block, recurrent neural network (RNN), and con-nectionist temporal classification loss to build an end-to-end ASR model. Facing the complex environments of ATC speech, instead of the hand-crafted features, a learning block is designed to extract informative features from raw waveforms for acoustic modeling. Both the SincNet and 1D con-volution blocks are applied to process the raw waveforms, whose outputs are concatenated to the RNN layers for the temporal modeling. Thanks to the ability to learn representations from raw waveforms, the proposed model can be optimized in a complete end-to-end manner, i.e., from wave-form to text. Finally, the multilingual issue in the ATC domain is also considered to achieve the ASR task by constructing a combined vocabu-lary of Chinese characters and English letters. The proposed approach is validated on a multilingual real-world corpus (ATCSpeech), and the exper-imental results demonstrate that the proposed approach outperforms other baselines, achieving a 6.9% character error rate.
引用
收藏
页码:538 / 544
页数:7
相关论文
共 50 条
  • [21] Continual Learning for Monolingual End-to-End Automatic Speech Recognition
    Vander Eeckt, Steven
    Van Hamme, Hugo
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 459 - 463
  • [22] End-to-End Audiovisual Speech Recognition System With Multitask Learning
    Tao, Fei
    Busso, Carlos
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 1 - 11
  • [23] Arabic speech recognition using end-to-end deep learning
    Alsayadi, Hamzah A.
    Abdelhamid, Abdelaziz A.
    Hegazy, Islam
    Fayed, Zaki T.
    IET SIGNAL PROCESSING, 2021, 15 (08) : 521 - 534
  • [24] End-to-End Automatic Speech Recognition with Deep Mutual Learning
    Masumura, Ryo
    Ihori, Mana
    Takashima, Akihiko
    Tanaka, Tomohiro
    Ashihara, Takanori
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 632 - 637
  • [25] Investigation of Transfer Learning for End-to-End Russian Speech Recognition
    Kipyatkova, Irina
    SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 349 - 357
  • [26] SCaLa: Supervised Contrastive Learning for End-to-End Speech Recognition
    Fu, Li
    Li, Xiaoxiao
    Wang, Runyu
    Fan, Lu
    Zhang, Zhengchen
    Chen, Meng
    Wu, Youzheng
    He, Xiaodong
    INTERSPEECH 2022, 2022, : 1006 - 1010
  • [27] Online Continual Learning of End-to-End Speech Recognition Models
    Yang, Muqiao
    Lane, Ian
    Watanabe, Shinji
    INTERSPEECH 2022, 2022, : 2668 - 2672
  • [28] Combining Articulatory Features with End-to-End Learning in Speech Recognition
    Qu, Leyuan
    Weber, Cornelius
    Lakomkin, Egor
    Twiefel, Johannes
    Wermter, Stefan
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT III, 2018, 11141 : 500 - 510
  • [29] ENHANCING END-TO-END MULTI-CHANNEL SPEECH SEPARATION VIA SPATIAL FEATURE LEARNING
    Gu, Rongzhi
    Zhang, Shi-Xiong
    Chen, Lianwu
    Xu, Yong
    Yu, Meng
    Su, Dan
    Zou, Yuexian
    Yu, Dong
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7319 - 7323
  • [30] End-to-End Learning of Autonomous Vehicle Lateral Control via MPC Training
    Mentasti, Simone
    Bersani, Mattia
    Arrigoni, Stefano
    Matteucci, Matteo
    Cheli, Federico
    INTELLIGENT AUTONOMOUS SYSTEMS 16, IAS-16, 2022, 412 : 195 - 211