THE ROYALFLUSH AUTOMATIC SPEECH DIARIZATION AND RECOGNITION SYSTEM FOR IN-CAR MULTI-CHANNEL AUTOMATIC SPEECH RECOGNITION CHALLENGE

被引:0
|
作者
Tian, Jingguang [1 ]
Ye, Shuaishuai [1 ]
Chen, Shunfei [1 ]
Xiang, Yang [1 ]
Yin, Zhaohui [1 ]
Hu, Xinhui [1 ]
Xu, Xinkang [1 ]
机构
[1] Hithink RoyalFlush AI Res Inst, Hangzhou, Zhejiang, Peoples R China
关键词
ICMC-ASR; ASDR; TS-VAD; speaker diarization; speech recognition;
D O I
10.1109/ICASSPW62465.2024.10626136
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents our system submission for the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge, which focuses on speaker diarization and speech recognition in complex multi-speaker scenarios. To address these challenges, we develop end-to-end speaker diarization models that notably decrease the diarization error rate (DER) by 49.58% compared to the official baseline on the development set. For speech recognition, we utilize self-supervised learning representations to train end-to-end ASR models. By integrating these models, we achieve a character error rate (CER) of 16.93% on the track 1 evaluation set, and a concatenated minimum permutation character error rate (cpCER) of 25.88% on the track 2 evaluation set.
引用
收藏
页码:1 / 2
页数:2
相关论文
共 50 条
  • [31] Multi-Stage Speech Enhancement for Automatic Speech Recognition
    Lee, Seungyeol
    Lee, Youngwoo
    Cho, Namgook
    2016 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2016,
  • [32] Multi-band automatic speech recognition
    Cerisara, C
    Fohr, D
    COMPUTER SPEECH AND LANGUAGE, 2001, 15 (02): : 151 - 174
  • [33] Multi-Channel Transformer Transducer for Speech Recognition
    Chang, Feng-Ju
    Radfar, Martin
    Mouchtaris, Athanasios
    Omologo, Maurizio
    INTERSPEECH 2021, 2021, : 296 - 300
  • [34] An automatic speech recognition system for spontaneous Punjabi speech corpus
    Kumar Y.
    Singh N.
    International Journal of Speech Technology, 2017, 20 (2) : 297 - 303
  • [35] Automatic speech segmentation in syllable centric speech recognition system
    Panda S.P.
    Nayak A.K.
    International Journal of Speech Technology, 2016, 19 (1) : 9 - 18
  • [36] Multi-channel Opus compression for far-field automatic speech recognition with a fixed bitrate budget
    Drude, Lukas
    Heymann, Jahn
    Schwarz, Andreas
    Valin, Jean-Marc
    INTERSPEECH 2021, 2021, : 1669 - 1673
  • [37] A GENERATIVE-DISCRIMINATIVE HYBRID APPROACH TO MULTI-CHANNEL NOISE REDUCTION FOR ROBUST AUTOMATIC SPEECH RECOGNITION
    Mentzner, Hendrik
    Araki, Shoko
    Fujimoto, Masakiyo
    Nakatani, Totohiro
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5740 - 5744
  • [38] Automatic Speech Recognition Based Odia System
    Karan, Biswajit
    Sahoo, Jayaprakash
    Sahu, P. K.
    2015 INTERNATIONAL CONFERENCE ON MICROWAVE, OPTICAL AND COMMUNICATION ENGINEERING (ICMOCE), 2015, : 353 - 356
  • [39] Automatic Speech Recognition System Development in the "Wild"
    Ragni, Anton
    Gales, Mark J. F.
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2217 - 2221
  • [40] An Automatic Malay Speech Recognition System for Dysathric
    Al-Haddad, S. A. R.
    PROCEEDINGS OF THE 8TH WSEAS INTERNATIONAL CONFERENCE ON CIRCUITS, SYSTEMS, ELECTRONICS, CONTROL & SIGNAL PROCESSING (CSECS'09), 2009, : 40 - +