ASoBO: Attentive Beamformer Selection for Distant Speaker Diarization in Meetings

被引:0
|
作者
Mariotte, Theo [1 ,2 ]
Larcher, Anthony [2 ]
Montresori, Silvio [1 ]
Thomas, Jean-Hugh [1 ]
机构
[1] Le Mans Univ, Inst Claude Chappe, LIUM, Le Mans, France
[2] Le Mans Univ, LAUM IA GS UMR CNRS 6613, Le Mans, France
来源
INTERSPEECH 2024 | 2024年
关键词
speaker diarization; distant speech; multimicrophone; explainable AI;
D O I
10.21437/Interspeech.2024-917
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker Diarization (SD) aims at grouping speech segments that belong to the same speaker. This task is required in many speech-processing applications, such as rich meeting transcription. In this context, distant microphone arrays usually capture the audio signal. Beamforming, i.e., spatial filtering, is a common practice to process multi-microphone audio data. However, it often requires an explicit localization of the active source to steer the filter. This paper proposes a self-attention-based algorithm to select the output of a bank of fixed spatial filters. This method serves as a feature extractor for joint Voice Activity (VAD) and Overlapped Speech Detection (OSD). The speaker diarization is then inferred from the detected segments. The approach shows convincing distant VAD, OSD, and SD performance, e.g. 14.5% DER on the AISHELL-4 dataset. The analysis of the self-attention weights demonstrates their explainability, as they correlate with the speaker's angular locations.
引用
收藏
页码:1620 / 1624
页数:5
相关论文
共 50 条
  • [21] Multi-stage speaker diarization for conference and lecture meetings
    Zhu, X.
    Barras, C.
    Lamel, L.
    Gauvain, J-L.
    MULTIMODAL TECHNOLOGIES FOR PERCEPTION OF HUMANS, 2008, 4625 : 533 - 542
  • [22] LEVERAGING SPEAKER DIARIZATION FOR MEETING RECOGNITION FROM DISTANT MICROPHONES
    Stolcke, Andreas
    Friedland, Gerald
    Imseng, David
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4390 - 4393
  • [23] Robust Statistical Processing of TDOA Estimates for Distant Speaker Diarization
    Parada, Pablo Peso
    Sharma, Dushyant
    van Waterschoot, Toon
    Naylor, Patrick A.
    2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 86 - 90
  • [24] Channel and channel subband selection for speaker diarization
    Ahmed, Ahmed Isam
    Chiverton, John P.
    Ndzi, David L.
    Al-Faris, Mahmoud M.
    COMPUTER SPEECH AND LANGUAGE, 2022, 75
  • [25] Selection of TDOA Parameters for MDM Speaker Diarization
    Martinez-Gonzalez, Beatriz
    Pardo, Jose M.
    Echeverry-Correa, Julian D.
    Vallejo-Pinto, Jose A.
    Barra-Chicote, Roberto
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2155 - 2158
  • [26] Online Speaker Diarization with Core Samples Selection
    Yue, Yanyan
    Du, Jun
    He, Mao-Kui
    Yeung, Yu Ting
    Wang, Renyu
    INTERSPEECH 2022, 2022, : 1466 - 1470
  • [27] Channel and channel subband selection for speaker diarization
    Ahmed, Ahmed Isam
    Chiverton, John P.
    Ndzi, David L.
    Al-Faris, Mahmoud M.
    Computer Speech and Language, 2022, 75
  • [28] MULTI-CHANNEL SPEAKER DIARIZATION USING SPATIAL FEATURES FOR MEETINGS
    Zheng, Naijun
    Li, Na
    Yu, JianWei
    Weng, Chao
    Su, Dan
    Liu, XunYing
    Meng, Helen
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7337 - 7341
  • [29] Speaker diarization for multi-party meetings using acoustic fusion
    Anguera, X
    Wooters, C
    Hernando, J
    2005 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2005, : 426 - 431
  • [30] Automatic weighting for the combination of TDOA and acoustic features in speaker diarization for meetings
    Anguera, Xavier
    Wooters, Chuck
    Pardo, Jose M.
    Hernando, Javier
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 241 - +