ASoBO: Attentive Beamformer Selection for Distant Speaker Diarization in Meetings

被引:0
|
作者
Mariotte, Theo [1 ,2 ]
Larcher, Anthony [2 ]
Montresori, Silvio [1 ]
Thomas, Jean-Hugh [1 ]
机构
[1] Le Mans Univ, Inst Claude Chappe, LIUM, Le Mans, France
[2] Le Mans Univ, LAUM IA GS UMR CNRS 6613, Le Mans, France
来源
关键词
speaker diarization; distant speech; multimicrophone; explainable AI;
D O I
10.21437/Interspeech.2024-917
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker Diarization (SD) aims at grouping speech segments that belong to the same speaker. This task is required in many speech-processing applications, such as rich meeting transcription. In this context, distant microphone arrays usually capture the audio signal. Beamforming, i.e., spatial filtering, is a common practice to process multi-microphone audio data. However, it often requires an explicit localization of the active source to steer the filter. This paper proposes a self-attention-based algorithm to select the output of a bank of fixed spatial filters. This method serves as a feature extractor for joint Voice Activity (VAD) and Overlapped Speech Detection (OSD). The speaker diarization is then inferred from the detected segments. The approach shows convincing distant VAD, OSD, and SD performance, e.g. 14.5% DER on the AISHELL-4 dataset. The analysis of the self-attention weights demonstrates their explainability, as they correlate with the speaker's angular locations.
引用
收藏
页码:1620 / 1624
页数:5
相关论文
共 50 条
  • [1] MUTUAL INFORMATION BASED CHANNEL SELECTION FOR SPEAKER DIARIZATION OF MEETINGS DATA
    Vijayasenan, Deepu
    Valente, Fabio
    Bourlard, Herve
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4065 - 4068
  • [2] Speaker diarization for multiple-distant-microphone meetings using several sources of information
    Pardo, Jose M.
    Anguera, Xavier
    Wooters, Charles
    IEEE TRANSACTIONS ON COMPUTERS, 2007, 56 (09) : 1212 - 1224
  • [3] Acoustic beamforming for speaker diarization of meetings
    Anguera, Xavier
    Wooters, Chuck
    Hernando, Javier
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (07): : 2011 - 2022
  • [4] IMPROVED SPEAKER DIARIZATION SYSTEM FOR MEETINGS
    El-Khoury, Elie
    Senac, Christine
    Pinquier, Julien
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4097 - 4100
  • [5] Purity algorithms for speaker diarization of meetings data
    Anguera, Xavier
    Wooters, Chuck
    Hernando, Javier
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 1025 - 1028
  • [6] Improving Speaker Diarization for CHIL Lecture Meetings
    Huang, Jing
    Marcheret, Etienne
    Visweswariah, Karthik
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2628 - 2631
  • [7] The SAIL Speaker Diarization System for Analysis of Spontaneous Meetings
    Han, Kyu J.
    Georgiou, Panayiotis G.
    Narayanan, Shrikanth S.
    2008 IEEE 10TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, VOLS 1 AND 2, 2008, : 970 - 975
  • [8] Self-Attentive Similarity Measurement Strategies in Speaker Diarization
    Lin, Qingjian
    Hou, Yu
    Li, Ming
    INTERSPEECH 2020, 2020, : 284 - 288
  • [9] A DOA based speaker diarization system for real meetings
    Araki, Shoko
    Fujimoto, Masakiyo
    Ishizuka, Kentaro
    Sawada, Hiroshi
    Makino, Shoji
    2008 HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS, 2008, : 30 - 33
  • [10] Agglomerative Information Bottleneck for speaker diarization of meetings data
    Vijayasenan, Deepu
    Valente, Fabio
    Bourlard, Herve
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 250 - 255