DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors

被引:1
|
作者
Landini, Federico [1 ]
Diez, Mireia [1 ]
Stafylakis, Themos [2 ,3 ]
Burget, Lukas [1 ]
机构
[1] Brno Univ Technol, Brno 61266, Czech Republic
[2] Omilia, Maroussi 15126, Greece
[3] Athens Univ Econ & Business, Athina 10434, Greece
基金
欧盟地平线“2020”; 美国国家科学基金会;
关键词
Decoding; Long short term memory; Biological system modeling; Vectors; Oral communication; Data models; Training; Attractor; DiaPer; end-to-end neural diarization; perceiver; speaker diarization;
D O I
10.1109/TASLP.2024.3422818
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Until recently, the field of speaker diarization was dominated by cascaded systems. Due to their limitations, mainly regarding overlapped speech and cumbersome pipelines, end-to-end models have gained great popularity lately. One of the most successful models is end-to-end neural diarization with encoder-decoder based attractors (EEND-EDA). In this work, we replace the EDA module with a Perceiver-based one and show its advantages over EEND-EDA; namely obtaining better performance on the largely studied Callhome dataset, finding the quantity of speakers in a conversation more accurately, and faster inference time. Furthermore, when exhaustively compared with other methods, our model, DiaPer, reaches remarkable performance with a very lightweight design. Besides, we perform comparisons with other works and a cascaded baseline across more than ten public wide-band datasets. Together with this publication, we release the code of DiaPer as well as models trained on public and free data.
引用
收藏
页码:3450 / 3465
页数:16
相关论文
共 50 条
  • [41] End-to-end neural network based optimal quadcopter control
    Ferede, Robin
    de Croon, Guido
    De Wagter, Christophe
    Izzo, Dario
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2024, 172
  • [42] An End-to-End Compression Framework Based on Convolutional Neural Networks
    Jiang, Feng
    Tao, Wen
    Liu, Shaohui
    Ren, Jie
    Guo, Xun
    Zhao, Debin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2018, 28 (10) : 3007 - 3018
  • [43] End-to-End Speech Emotion Recognition Based on Neural Network
    Zhu, Bing
    Zhou, Wenkai
    Wang, Yutian
    Wang, Hui
    Cai, Juan Juan
    2017 17TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT 2017), 2017, : 1634 - 1638
  • [44] An End-to-End Compression Framework Based on Convolutional Neural Networks
    Tao, Wen
    Jiang, Feng
    Zhang, Shengping
    Ren, Jie
    Shi, Wuzhen
    Zuo, Wangmeng
    Guo, Xun
    Zhao, Debin
    2017 DATA COMPRESSION CONFERENCE (DCC), 2017, : 463 - 463
  • [45] End-to-End Neural Transformer Based Spoken Language Understanding
    Radfar, Martin
    Mouchtaris, Athanasios
    Kunzmann, Siegfried
    INTERSPEECH 2020, 2020, : 866 - 870
  • [46] END-TO-END NEURAL NETWORK BASED AUTOMATED SPEECH SCORING
    Chen, Lei
    Tao, Jidong
    Ghaffarzadegan, Shabnam
    Qian, Yao
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6234 - 6238
  • [47] OVERLAP-AWARE LOW-LATENCY ONLINE SPEAKER DIARIZATION BASED ON END-TO-END LOCAL SEGMENTATION
    Coria, Juan M.
    Bredin, Herve
    Ghannay, Sahar
    Rosset, Sophie
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 1139 - 1146
  • [48] End-to-end child-adult speech diarization in naturalistic conditions of preschool classrooms
    Kothalkar, Prasanna V.
    Irvin, Dwight
    Buzhardt, Jay
    Hansen, John H.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (03):
  • [49] A study on end-to-end speaker diarization system using single-label classification
    Jung, Jaehee
    Kim, Wooil
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2023, 42 (06): : 536 - 543
  • [50] TRANSCRIBE-TO-DIARIZE: NEURAL SPEAKER DIARIZATION FOR UNLIMITED NUMBER OF SPEAKERS USING END-TO-END SPEAKER-ATTRIBUTED ASR
    Kanda, Naoyuki
    Xiao, Xiong
    Gaur, Yashesh
    Wang, Xiaofei
    Meng, Zhong
    Chen, Zhuo
    Yoshioka, Takuya
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8082 - 8086