Random Utterance Concatenation Based Data Augmentation for Improving Short-video Speech Recognition

被引：0

作者：

Lin, Yist Y. ^{[1
]}

Han, Tao ^{[1
]}

Xu, Haihua ^{[1
]}

Van Tung Pham ^{[1
]}

Khassanov, Yerbolat ^{[1
]}

Chong, Tze Yuang ^{[1
]}

He, Yi ^{[1
]}

Lu, Lu ^{[1
]}

Ma, Zejun ^{[1
]}

机构：

[1] ByteDance, Beijing, Peoples R China

来源：

INTERSPEECH 2023 | 2023年

关键词：

random utterance concatenation; data augmentation; short video; end-to-end; speech recognition;

D O I：

10.21437/Interspeech.2023-1272

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

One of limitations in end-to-end automatic speech recognition (ASR) framework is its performance would be compromised if train-test utterance lengths are mismatched. In this paper, we propose an on-the-fly random utterance concatenation (RUC) based data augmentation method to alleviate train-test utterance length mismatch issue for short-video ASR task. Specifically, we are motivated by observations that our human-transcribed training utterances tend to be much shorter for short-video spontaneous speech (similar to 3 seconds on average), while our test utterance generated from voice activity detection front-end is much longer (similar to 10 seconds on average). Such a mismatch can lead to suboptimal performance. Empirically, it's observed the proposed RUC method significantly improves long utterance recognition without performance drop on short one. Overall, it achieves 5.72% word error rate reduction on average for 15 languages and improved robustness to various utterance length.

引用

页码：904 / 908

页数：5

共 50 条

[41] Data Augmentation using Healthy Speech for Dysarthric Speech Recognition
Vachhani, Bhavik
Bhat, Chitralekha
Kopparapu, Sunil Kumar
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 471 - 475
[42] You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
Laptev, Aleksandr
Korostik, Roman
Svischev, Aleksey
Andrusenko, Andrei
Medennikov, Ivan
Rybin, Sergey
2020 13TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2020), 2020, : 439 - 444
[43] Speech recognition and utterance verification based on a generalized confidence score
Koo, MW
Lee, CH
Juang, BH
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (08): : 821 - 832
[44] Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation
Bartelds, Martijn
San, Nay
McDonnell, Bradley
Jurafsky, Dan
Wieling, Martijn
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 715 - 729
[45] Data Augmentation for Improving Explainability of Hate Speech Detection
Gunjan Ansari
Parmeet Kaur
Chandni Saxena
Arabian Journal for Science and Engineering, 2024, 49 : 3609 - 3621
[46] Data Augmentation for Improving Explainability of Hate Speech Detection
Ansari, Gunjan
Kaur, Parmeet
Saxena, Chandni
ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2024, 49 (03) : 3609 - 3621
[47] Enhanced Speech Emotion Recognition Using DCGAN-Based Data Augmentation
Baek, Ji-Young
Lee, Seok-Pil
Tsihrintzis, George A.
ELECTRONICS, 2023, 12 (18)
[48] GENERATIVE ADVERSARIAL NETWORKS BASED DATA AUGMENTATION FOR NOISE ROBUST SPEECH RECOGNITION
Hu, Hu
Tan, Tian
Qian, Yanmin
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5044 - 5048
[49] Reinforcement Learning based Data Augmentation for Noise Robust Speech Emotion Recognition
Ranjan, Sumit
Chakraborty, Rupayan
Kopparapu, Sunil Kumar
INTERSPEECH 2024, 2024, : 1040 - 1044
[50] Lattice-based Data Augmentation for Code-switching Speech Recognition
Hartanto, Roland
Uto, Kuniaki
Shinoda, Koichi
PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1667 - 1672

← 1 2 3 4 5 →