THE ROYALFLUSH AUTOMATIC SPEECH DIARIZATION AND RECOGNITION SYSTEM FOR IN-CAR MULTI-CHANNEL AUTOMATIC SPEECH RECOGNITION CHALLENGE

被引:0
|
作者
Tian, Jingguang [1 ]
Ye, Shuaishuai [1 ]
Chen, Shunfei [1 ]
Xiang, Yang [1 ]
Yin, Zhaohui [1 ]
Hu, Xinhui [1 ]
Xu, Xinkang [1 ]
机构
[1] Hithink RoyalFlush AI Res Inst, Hangzhou, Zhejiang, Peoples R China
关键词
ICMC-ASR; ASDR; TS-VAD; speaker diarization; speech recognition;
D O I
10.1109/ICASSPW62465.2024.10626136
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents our system submission for the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge, which focuses on speaker diarization and speech recognition in complex multi-speaker scenarios. To address these challenges, we develop end-to-end speaker diarization models that notably decrease the diarization error rate (DER) by 49.58% compared to the official baseline on the development set. For speech recognition, we utilize self-supervised learning representations to train end-to-end ASR models. By integrating these models, we achieve a character error rate (CER) of 16.93% on the track 1 evaluation set, and a concatenated minimum permutation character error rate (cpCER) of 25.88% on the track 2 evaluation set.
引用
收藏
页码:1 / 2
页数:2
相关论文
共 50 条
  • [1] THE FOSAFER SYSTEM FOR THE ICASSP2024 IN-CAR MULTI-CHANNEL AUTOMATIC SPEECH RECOGNITION CHALLENGE
    Huang, Shangkun
    Du, Yuxuan
    Wang, Yankai
    Deng, Jing
    Zheng, Rong
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 5 - 6
  • [2] ICMC-ASR: THE ICASSP 2024 IN-CAR MULTI-CHANNEL AUTOMATIC SPEECH RECOGNITION CHALLENGE<bold> </bold>
    Wang, He
    Guo, Pengcheng
    Li, Yue
    Zhang, Ao
    Sun, Jiayao
    Xie, Lei
    Chen, Wei
    Zhou, Pan
    Bu, Hui
    Xu, Xin
    Zhang, Binbin
    Chen, Zhuo
    Wu, Jian
    Wang, Longbiao
    Chng, Eng Siong
    Li, Sun
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 63 - 64
  • [3] SPEAKER ADAPTED BEAMFORMING FOR MULTI-CHANNEL AUTOMATIC SPEECH RECOGNITION
    Menne, Tobias
    Schlueter, Ralf
    Ney, Hermann
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 535 - 541
  • [4] The segmentation of multi-channel meeting recordings for automatic speech recognition
    Dines, John
    Vepa, Jithendra
    Hain, Thomas
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1213 - +
  • [5] MULTI-CHANNEL AUTOMATIC SPEECH RECOGNITION USING DEEP COMPLEX UNET
    Kong, Yuxiang
    Wu, Jian
    Wang, Quandong
    Gao, Peng
    Zhuang, Weiji
    Wang, Yujun
    Xie, Lei
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 104 - 110
  • [6] PERFORMANCE MONITORING FOR AUTOMATIC SPEECH RECOGNITION IN NOISY MULTI-CHANNEL ENVIRONMENTS
    Meyerl, Bernd T.
    Mallidi, Sri Harish
    Martinez, Angel Mario Castro
    Paya-Vaya, Guillermo
    Kayser, Hendrik
    Hermansky, Hynek
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 50 - 56
  • [7] Automatic Speech Recognition System Channel Modeling
    Tan, Qun Feng
    Audhkhasi, Kartik
    Georgiou, Panayiotis G.
    Ettelaie, Emil
    Narayanan, Shrikanth
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2442 - 2445
  • [8] Multi-Channel Speech Enhancement and Amplitude Modulation Analysis for Noise Robust Automatic Speech Recognition
    Moritz, Niko
    Adiloglu, Kamil
    Anemueller, Joern
    Goetze, Stefan
    Kollmeier, Birger
    COMPUTER SPEECH AND LANGUAGE, 2017, 46 : 558 - 573
  • [9] Automatic Speech Recognition System for Malay Speaking Children Automatic Speech Recognition system
    Rahman, Feisal Dani
    Mohamed, Noraini
    Mustafa, Mumtaz Begum
    Salim, Siti Salwah
    2014 THIRD ICT INTERNATIONAL STUDENT PROJECT CONFERENCE (ICT-ISPC), 2014, : 79 - 82
  • [10] AUTOMATIC SPEECH RECOGNITION SYSTEM
    RUSKE, G
    UMSCHAU IN WISSENSCHAFT UND TECHNIK, 1979, 79 (18) : 566 - 572