THE ROYALFLUSH AUTOMATIC SPEECH DIARIZATION AND RECOGNITION SYSTEM FOR IN-CAR MULTI-CHANNEL AUTOMATIC SPEECH RECOGNITION CHALLENGE

被引:0
|
作者
Tian, Jingguang [1 ]
Ye, Shuaishuai [1 ]
Chen, Shunfei [1 ]
Xiang, Yang [1 ]
Yin, Zhaohui [1 ]
Hu, Xinhui [1 ]
Xu, Xinkang [1 ]
机构
[1] Hithink RoyalFlush AI Res Inst, Hangzhou, Zhejiang, Peoples R China
关键词
ICMC-ASR; ASDR; TS-VAD; speaker diarization; speech recognition;
D O I
10.1109/ICASSPW62465.2024.10626136
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents our system submission for the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge, which focuses on speaker diarization and speech recognition in complex multi-speaker scenarios. To address these challenges, we develop end-to-end speaker diarization models that notably decrease the diarization error rate (DER) by 49.58% compared to the official baseline on the development set. For speech recognition, we utilize self-supervised learning representations to train end-to-end ASR models. By integrating these models, we achieve a character error rate (CER) of 16.93% on the track 1 evaluation set, and a concatenated minimum permutation character error rate (cpCER) of 25.88% on the track 2 evaluation set.
引用
收藏
页码:1 / 2
页数:2
相关论文
共 50 条
  • [21] AUTOMATIC RECOGNITION OF SPEECH
    MARILL, T
    IRE TRANSACTIONS ON HUMAN FACTORS IN ELECTRONICS, 1961, HFE2 (01): : 34 - +
  • [22] AUTOMATIC SPEECH RECOGNITION
    RAO, PVS
    PALIWAL, KK
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 1986, 9 : 85 - 120
  • [23] The AhoSR Automatic Speech Recognition System
    Odriozola, Igor
    Serrano, Luis
    Hernaez, Inma
    Navas, Eva
    ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2014, 2014, 8854 : 279 - 288
  • [24] AN AUTOMATIC SPEECH RECOGNITION SYSTEM TABARCA
    BENEDI, JM
    CASACUBERTA, F
    VIDAL, E
    REVISTA DE INFORMATICA Y AUTOMATICA, 1990, 23 (01): : 15 - 24
  • [25] TETRA CHANNEL SIMULATION FOR AUTOMATIC SPEECH RECOGNITION
    Stein, Daniel
    Winkler, Thomas
    Schwenninger, Jochen
    Bardeli, Rolf
    2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 1653 - 1657
  • [26] A COMPARATIVE STUDY OF MULTI-CHANNEL PROCESSING METHODS FOR NOISY AUTOMATIC SPEECH RECOGNITION IN URBAN ENVIRONMENTS
    Tran Huy Dat
    Dennis, Jonathan
    Ren, Leng Yi
    Terence, Ng Wen Zheng
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 6465 - 6469
  • [27] Robust automatic speech recognition using a multi-channel signal separation front-end
    Yen, KC
    Zhao, YX
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1337 - 1340
  • [28] Method for adaptive on-line data fusion in Multi-Channel automatic speech recognition systems
    Ivanov, R
    2002 FIRST INTERNATIONAL IEEE SYMPOSIUM INTELLIGENT SYSTEMS, VOL 1, PROCEEDINGS, 2002, : 350 - 353
  • [29] Speech production and automatic speech recognition
    Acoustics Bulletin, 2000, 25 (02):
  • [30] AUTOMATIC SPEECH RECOGNITION OF IMPAIRED SPEECH
    CARLSON, GS
    BERNSTEIN, J
    INTERNATIONAL JOURNAL OF REHABILITATION RESEARCH, 1988, 11 (04) : 396 - 398