DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors

被引:1
|
作者
Landini, Federico [1 ]
Diez, Mireia [1 ]
Stafylakis, Themos [2 ,3 ]
Burget, Lukas [1 ]
机构
[1] Brno Univ Technol, Brno 61266, Czech Republic
[2] Omilia, Maroussi 15126, Greece
[3] Athens Univ Econ & Business, Athina 10434, Greece
基金
欧盟地平线“2020”; 美国国家科学基金会;
关键词
Decoding; Long short term memory; Biological system modeling; Vectors; Oral communication; Data models; Training; Attractor; DiaPer; end-to-end neural diarization; perceiver; speaker diarization;
D O I
10.1109/TASLP.2024.3422818
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Until recently, the field of speaker diarization was dominated by cascaded systems. Due to their limitations, mainly regarding overlapped speech and cumbersome pipelines, end-to-end models have gained great popularity lately. One of the most successful models is end-to-end neural diarization with encoder-decoder based attractors (EEND-EDA). In this work, we replace the EDA module with a Perceiver-based one and show its advantages over EEND-EDA; namely obtaining better performance on the largely studied Callhome dataset, finding the quantity of speakers in a conversation more accurately, and faster inference time. Furthermore, when exhaustively compared with other methods, our model, DiaPer, reaches remarkable performance with a very lightweight design. Besides, we perform comparisons with other works and a cascaded baseline across more than ten public wide-band datasets. Together with this publication, we release the code of DiaPer as well as models trained on public and free data.
引用
收藏
页码:3450 / 3465
页数:16
相关论文
共 50 条
  • [1] Encoder-Decoder Based Attractors for End-to-End Neural Diarization
    Horiguchi, Shota
    Fujita, Yusuke
    Watanabe, Shinji
    Xue, Yawen
    Garcia, Paola
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1493 - 1507
  • [2] End-to-End Neural Speaker Diarization With Non-Autoregressive Attractors
    Rybicka, Magdalena
    Villalba, Jesus
    Thebaud, Thomas
    Dehak, Najim
    Kowalczyk, Konrad
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3960 - 3973
  • [3] End-to-End Neural Speaker Diarization with an Iterative Refinement of Non-Autoregressive Attention-based Attractors
    Rybicka, Magdalena
    Villalba, Jesus
    Dehak, Najim
    Kowalczyk, Konrad
    INTERSPEECH 2022, 2022, : 5090 - 5094
  • [4] End-to-end Neural Diarization: From Transformer to Conformer
    Liu, Yi Chieh
    Han, Eunjung
    Lee, Chul
    Stolcke, Andreas
    INTERSPEECH 2021, 2021, : 3081 - 3085
  • [5] ASR-AWARE END-TO-END NEURAL DIARIZATION
    Khare, Aparna
    Han, Eunjung
    Yang, Yuguang
    Stolcke, Andreas
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8092 - 8096
  • [6] End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors
    Horiguchi, Shota
    Fujita, Yusuke
    Watanabe, Shinji
    Xue, Yawen
    Nagamatsu, Kenji
    INTERSPEECH 2020, 2020, : 269 - 273
  • [7] End-to-End Neural Speaker Diarization with Absolute Speaker Loss
    Wang, Chao
    Li, Jie
    Fang, Xiang
    Kang, Jian
    Li, Yongxiang
    INTERSPEECH 2023, 2023, : 3577 - 3581
  • [8] END-TO-END NEURAL SPEAKER DIARIZATION WITH SELF-ATTENTION
    Fujita, Yusuke
    Kanda, Naoyuki
    Horiguchi, Shota
    Xue, Yawen
    Nagamatsu, Kenji
    Watanabe, Shinji
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 296 - 303
  • [9] End-to-End Audio-Visual Neural Speaker Diarization
    He, Mao-kui
    Du, Jun
    Lee, Chin-Hui
    INTERSPEECH 2022, 2022, : 1461 - 1465
  • [10] Robust End-to-end Speaker Diarization with Generic Neural Clustering
    Yang, Chenyu
    Wang, Yu
    INTERSPEECH 2022, 2022, : 1471 - 1475