THE ROYALFLUSH AUTOMATIC SPEECH DIARIZATION AND RECOGNITION SYSTEM FOR IN-CAR MULTI-CHANNEL AUTOMATIC SPEECH RECOGNITION CHALLENGE

被引：0

作者：

Tian, Jingguang ^{[1
]}

Ye, Shuaishuai ^{[1
]}

Chen, Shunfei ^{[1
]}

Xiang, Yang ^{[1
]}

Yin, Zhaohui ^{[1
]}

Hu, Xinhui ^{[1
]}

Xu, Xinkang ^{[1
]}

机构：

[1] Hithink RoyalFlush AI Res Inst, Hangzhou, Zhejiang, Peoples R China

来源：

2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024 | 2024年

关键词：

ICMC-ASR; ASDR; TS-VAD; speaker diarization; speech recognition;

D O I：

10.1109/ICASSPW62465.2024.10626136

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents our system submission for the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge, which focuses on speaker diarization and speech recognition in complex multi-speaker scenarios. To address these challenges, we develop end-to-end speaker diarization models that notably decrease the diarization error rate (DER) by 49.58% compared to the official baseline on the development set. For speech recognition, we utilize self-supervised learning representations to train end-to-end ASR models. By integrating these models, we achieve a character error rate (CER) of 16.93% on the track 1 evaluation set, and a concatenated minimum permutation character error rate (cpCER) of 25.88% on the track 2 evaluation set.

引用

页码：1 / 2

页数：2

共 50 条

[31] Multi-Stage Speech Enhancement for Automatic Speech Recognition
Lee, Seungyeol
Lee, Youngwoo
Cho, Namgook
2016 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2016,
[32] Multi-band automatic speech recognition
Cerisara, C
Fohr, D
COMPUTER SPEECH AND LANGUAGE, 2001, 15 (02): : 151 - 174
[33] Multi-Channel Transformer Transducer for Speech Recognition
Chang, Feng-Ju
Radfar, Martin
Mouchtaris, Athanasios
Omologo, Maurizio
INTERSPEECH 2021, 2021, : 296 - 300
[34] An automatic speech recognition system for spontaneous Punjabi speech corpus
Kumar Y.
Singh N.
International Journal of Speech Technology, 2017, 20 (2) : 297 - 303
[35] Automatic speech segmentation in syllable centric speech recognition system
Panda S.P.
Nayak A.K.
International Journal of Speech Technology, 2016, 19 (1) : 9 - 18
[36] Multi-channel Opus compression for far-field automatic speech recognition with a fixed bitrate budget
Drude, Lukas
Heymann, Jahn
Schwarz, Andreas
Valin, Jean-Marc
INTERSPEECH 2021, 2021, : 1669 - 1673
[37] A GENERATIVE-DISCRIMINATIVE HYBRID APPROACH TO MULTI-CHANNEL NOISE REDUCTION FOR ROBUST AUTOMATIC SPEECH RECOGNITION
Mentzner, Hendrik
Araki, Shoko
Fujimoto, Masakiyo
Nakatani, Totohiro
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5740 - 5744
[38] Automatic Speech Recognition Based Odia System
Karan, Biswajit
Sahoo, Jayaprakash
Sahu, P. K.
2015 INTERNATIONAL CONFERENCE ON MICROWAVE, OPTICAL AND COMMUNICATION ENGINEERING (ICMOCE), 2015, : 353 - 356
[39] Automatic Speech Recognition System Development in the "Wild"
Ragni, Anton
Gales, Mark J. F.
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2217 - 2221
[40] An Automatic Malay Speech Recognition System for Dysathric
Al-Haddad, S. A. R.
PROCEEDINGS OF THE 8TH WSEAS INTERNATIONAL CONFERENCE ON CIRCUITS, SYSTEMS, ELECTRONICS, CONTROL & SIGNAL PROCESSING (CSECS'09), 2009, : 40 - +

← 1 2 3 4 5 →