Speech Recognition for Air Traffic Control via Feature Learning and End-to-End Training

被引：3

作者：

Fan, Peng ^{[1
]}

Hua, Xiyao ^{[2
]}

Lin, Yi ^{[2
]}

Yang, Bo ^{[2
]}

Zhang, Jianwei ^{[2
]}

Ge, Wenyi ^{[3
]}

Guo, Dongyue ^{[1
]}

机构：

[1] Sichuan Univ, Natl Key Lab Fundamental Sci Synthet Vis, Chengdu, Peoples R China

[2] Sichuan Univ, Coll Comp Sci, Chengdu, Peoples R China

[3] Chengdu Univ Informat Technol, Coll Comp Sci, Chengdu, Peoples R China

来源：

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2023年 / E106D卷 / 04期

基金：

中国国家自然科学基金;

关键词：

  automatic speech recognition; feature learning; air traffic con-trol; multilingual; end-to-end training;

D O I：

10.1587/transinf.2022EDP7151

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this work, we propose a new automatic speech recog-nition (ASR) system based on feature learning and an end-to-end training procedure for air traffic control (ATC) systems. The proposed model inte-grates the feature learning block, recurrent neural network (RNN), and con-nectionist temporal classification loss to build an end-to-end ASR model. Facing the complex environments of ATC speech, instead of the hand-crafted features, a learning block is designed to extract informative features from raw waveforms for acoustic modeling. Both the SincNet and 1D con-volution blocks are applied to process the raw waveforms, whose outputs are concatenated to the RNN layers for the temporal modeling. Thanks to the ability to learn representations from raw waveforms, the proposed model can be optimized in a complete end-to-end manner, i.e., from wave-form to text. Finally, the multilingual issue in the ATC domain is also considered to achieve the ASR task by constructing a combined vocabu-lary of Chinese characters and English letters. The proposed approach is validated on a multilingual real-world corpus (ATCSpeech), and the exper-imental results demonstrate that the proposed approach outperforms other baselines, achieving a 6.9% character error rate.

引用

页码：538 / 544

页数：7

共 50 条

[21] Continual Learning for Monolingual End-to-End Automatic Speech Recognition
Vander Eeckt, Steven
Van Hamme, Hugo
2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 459 - 463
[22] End-to-End Audiovisual Speech Recognition System With Multitask Learning
Tao, Fei
Busso, Carlos
IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 1 - 11
[23] Arabic speech recognition using end-to-end deep learning
Alsayadi, Hamzah A.
Abdelhamid, Abdelaziz A.
Hegazy, Islam
Fayed, Zaki T.
IET SIGNAL PROCESSING, 2021, 15 (08) : 521 - 534
[24] End-to-End Automatic Speech Recognition with Deep Mutual Learning
Masumura, Ryo
Ihori, Mana
Takashima, Akihiko
Tanaka, Tomohiro
Ashihara, Takanori
2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 632 - 637
[25] Investigation of Transfer Learning for End-to-End Russian Speech Recognition
Kipyatkova, Irina
SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 349 - 357
[26] SCaLa: Supervised Contrastive Learning for End-to-End Speech Recognition
Fu, Li
Li, Xiaoxiao
Wang, Runyu
Fan, Lu
Zhang, Zhengchen
Chen, Meng
Wu, Youzheng
He, Xiaodong
INTERSPEECH 2022, 2022, : 1006 - 1010
[27] Online Continual Learning of End-to-End Speech Recognition Models
Yang, Muqiao
Lane, Ian
Watanabe, Shinji
INTERSPEECH 2022, 2022, : 2668 - 2672
[28] Combining Articulatory Features with End-to-End Learning in Speech Recognition
Qu, Leyuan
Weber, Cornelius
Lakomkin, Egor
Twiefel, Johannes
Wermter, Stefan
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT III, 2018, 11141 : 500 - 510
[29] ENHANCING END-TO-END MULTI-CHANNEL SPEECH SEPARATION VIA SPATIAL FEATURE LEARNING
Gu, Rongzhi
Zhang, Shi-Xiong
Chen, Lianwu
Xu, Yong
Yu, Meng
Su, Dan
Zou, Yuexian
Yu, Dong
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7319 - 7323
[30] End-to-End Learning of Autonomous Vehicle Lateral Control via MPC Training
Mentasti, Simone
Bersani, Mattia
Arrigoni, Stefano
Matteucci, Matteo
Cheli, Federico
INTELLIGENT AUTONOMOUS SYSTEMS 16, IAS-16, 2022, 412 : 195 - 211

← 1 2 3 4 5 →