Speech Recognition for Air Traffic Control via Feature Learning and End-to-End Training

被引：3

作者：

Fan, Peng ^{[1
]}

Hua, Xiyao ^{[2
]}

Lin, Yi ^{[2
]}

Yang, Bo ^{[2
]}

Zhang, Jianwei ^{[2
]}

Ge, Wenyi ^{[3
]}

Guo, Dongyue ^{[1
]}

机构：

[1] Sichuan Univ, Natl Key Lab Fundamental Sci Synthet Vis, Chengdu, Peoples R China

[2] Sichuan Univ, Coll Comp Sci, Chengdu, Peoples R China

[3] Chengdu Univ Informat Technol, Coll Comp Sci, Chengdu, Peoples R China

来源：

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2023年 / E106D卷 / 04期

基金：

中国国家自然科学基金;

关键词：

  automatic speech recognition; feature learning; air traffic con-trol; multilingual; end-to-end training;

D O I：

10.1587/transinf.2022EDP7151

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this work, we propose a new automatic speech recog-nition (ASR) system based on feature learning and an end-to-end training procedure for air traffic control (ATC) systems. The proposed model inte-grates the feature learning block, recurrent neural network (RNN), and con-nectionist temporal classification loss to build an end-to-end ASR model. Facing the complex environments of ATC speech, instead of the hand-crafted features, a learning block is designed to extract informative features from raw waveforms for acoustic modeling. Both the SincNet and 1D con-volution blocks are applied to process the raw waveforms, whose outputs are concatenated to the RNN layers for the temporal modeling. Thanks to the ability to learn representations from raw waveforms, the proposed model can be optimized in a complete end-to-end manner, i.e., from wave-form to text. Finally, the multilingual issue in the ATC domain is also considered to achieve the ASR task by constructing a combined vocabu-lary of Chinese characters and English letters. The proposed approach is validated on a multilingual real-world corpus (ATCSpeech), and the exper-imental results demonstrate that the proposed approach outperforms other baselines, achieving a 6.9% character error rate.

引用

页码：538 / 544

页数：7

共 50 条

[1] Towards multilingual end-to-end speech recognition for air traffic control
Lin, Yi
Yang, Bo
Guo, Dongyue
Fan, Peng
IET INTELLIGENT TRANSPORT SYSTEMS, 2021, 15 (09) : 1203 - 1214
[2] ATCSpeechNet: A multilingual end-to-end speech recognition framework for air traffic control systems
Lin, Yi
Yang, Bo
Li, Linchao
Guo, Dongyue
Zhang, Jianwei
Chen, Hu
Zhang, Yi
APPLIED SOFT COMPUTING, 2021, 112
[3] End-to-End Speech Recognition Sequence Training With Reinforcement Learning
Tjandra, Andros
Sakti, Sakriani
Nakamura, Satoshi
IEEE ACCESS, 2019, 7 : 79758 - 79769
[4] Improved CTC-Attention Based End-to-End Speech Recognition on Air Traffic Control
Zhou, Kai
Yang, Qun
Sun, XiuSong
Liu, ShaoHan
Lu, JinJun
INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING: BIG DATA AND MACHINE LEARNING, PT II, 2019, 11936 : 187 - 196
[5] END-TO-END TRAINING OF A LARGE VOCABULARY END-TO-END SPEECH RECOGNITION SYSTEM
Kim, Chanwoo
Kim, Sungsoo
Kim, Kwangyoun
Kumar, Mehul
Kim, Jiyeon
Lee, Kyungmin
Han, Changwoo
Garg, Abhinav
Kim, Eunhyang
Shin, Minkyoo
Singh, Shatrughan
Heck, Larry
Gowda, Dhananjaya
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 562 - 569
[6] SELF-TRAINING FOR END-TO-END SPEECH RECOGNITION
Kahn, Jacob
Lee, Ann
Hannun, Awni
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7084 - 7088
[7] INCREMENTAL LEARNING FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
Fu, Li
Li, Xiaoxiao
Zi, Libo
Zhang, Zhengchen
Wu, Youzheng
He, Xiaodong
Zhou, Bowen
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 320 - 327
[8] IMPROVING END-TO-END SPEECH RECOGNITION WITH POLICY LEARNING
Zhou, Yingbo
Xiong, Caiming
Socher, Richard
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5819 - 5823
[9] Towards end-to-end speech recognition with transfer learning
Chu-Xiong Qin
Dan Qu
Lian-Hai Zhang
EURASIP Journal on Audio, Speech, and Music Processing, 2018
[10] Towards end-to-end speech recognition with transfer learning
Qin, Chu-Xiong
Qu, Dan
Zhang, Lian-Hai
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2018,

← 1 2 3 4 5 →