Speech Recognition for Air Traffic Control via Feature Learning and End-to-End Training

被引:3
|
作者
Fan, Peng [1 ]
Hua, Xiyao [2 ]
Lin, Yi [2 ]
Yang, Bo [2 ]
Zhang, Jianwei [2 ]
Ge, Wenyi [3 ]
Guo, Dongyue [1 ]
机构
[1] Sichuan Univ, Natl Key Lab Fundamental Sci Synthet Vis, Chengdu, Peoples R China
[2] Sichuan Univ, Coll Comp Sci, Chengdu, Peoples R China
[3] Chengdu Univ Informat Technol, Coll Comp Sci, Chengdu, Peoples R China
基金
中国国家自然科学基金;
关键词
  automatic speech recognition; feature learning; air traffic con-trol; multilingual; end-to-end training;
D O I
10.1587/transinf.2022EDP7151
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this work, we propose a new automatic speech recog-nition (ASR) system based on feature learning and an end-to-end training procedure for air traffic control (ATC) systems. The proposed model inte-grates the feature learning block, recurrent neural network (RNN), and con-nectionist temporal classification loss to build an end-to-end ASR model. Facing the complex environments of ATC speech, instead of the hand-crafted features, a learning block is designed to extract informative features from raw waveforms for acoustic modeling. Both the SincNet and 1D con-volution blocks are applied to process the raw waveforms, whose outputs are concatenated to the RNN layers for the temporal modeling. Thanks to the ability to learn representations from raw waveforms, the proposed model can be optimized in a complete end-to-end manner, i.e., from wave-form to text. Finally, the multilingual issue in the ATC domain is also considered to achieve the ASR task by constructing a combined vocabu-lary of Chinese characters and English letters. The proposed approach is validated on a multilingual real-world corpus (ATCSpeech), and the exper-imental results demonstrate that the proposed approach outperforms other baselines, achieving a 6.9% character error rate.
引用
收藏
页码:538 / 544
页数:7
相关论文
共 50 条
  • [1] Towards multilingual end-to-end speech recognition for air traffic control
    Lin, Yi
    Yang, Bo
    Guo, Dongyue
    Fan, Peng
    IET INTELLIGENT TRANSPORT SYSTEMS, 2021, 15 (09) : 1203 - 1214
  • [2] ATCSpeechNet: A multilingual end-to-end speech recognition framework for air traffic control systems
    Lin, Yi
    Yang, Bo
    Li, Linchao
    Guo, Dongyue
    Zhang, Jianwei
    Chen, Hu
    Zhang, Yi
    APPLIED SOFT COMPUTING, 2021, 112
  • [3] End-to-End Speech Recognition Sequence Training With Reinforcement Learning
    Tjandra, Andros
    Sakti, Sakriani
    Nakamura, Satoshi
    IEEE ACCESS, 2019, 7 : 79758 - 79769
  • [4] Improved CTC-Attention Based End-to-End Speech Recognition on Air Traffic Control
    Zhou, Kai
    Yang, Qun
    Sun, XiuSong
    Liu, ShaoHan
    Lu, JinJun
    INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING: BIG DATA AND MACHINE LEARNING, PT II, 2019, 11936 : 187 - 196
  • [5] END-TO-END TRAINING OF A LARGE VOCABULARY END-TO-END SPEECH RECOGNITION SYSTEM
    Kim, Chanwoo
    Kim, Sungsoo
    Kim, Kwangyoun
    Kumar, Mehul
    Kim, Jiyeon
    Lee, Kyungmin
    Han, Changwoo
    Garg, Abhinav
    Kim, Eunhyang
    Shin, Minkyoo
    Singh, Shatrughan
    Heck, Larry
    Gowda, Dhananjaya
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 562 - 569
  • [6] SELF-TRAINING FOR END-TO-END SPEECH RECOGNITION
    Kahn, Jacob
    Lee, Ann
    Hannun, Awni
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7084 - 7088
  • [7] INCREMENTAL LEARNING FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
    Fu, Li
    Li, Xiaoxiao
    Zi, Libo
    Zhang, Zhengchen
    Wu, Youzheng
    He, Xiaodong
    Zhou, Bowen
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 320 - 327
  • [8] IMPROVING END-TO-END SPEECH RECOGNITION WITH POLICY LEARNING
    Zhou, Yingbo
    Xiong, Caiming
    Socher, Richard
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5819 - 5823
  • [9] Towards end-to-end speech recognition with transfer learning
    Chu-Xiong Qin
    Dan Qu
    Lian-Hai Zhang
    EURASIP Journal on Audio, Speech, and Music Processing, 2018
  • [10] Towards end-to-end speech recognition with transfer learning
    Qin, Chu-Xiong
    Qu, Dan
    Zhang, Lian-Hai
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2018,