ASoBO: Attentive Beamformer Selection for Distant Speaker Diarization in Meetings

被引:0
|
作者
Mariotte, Theo [1 ,2 ]
Larcher, Anthony [2 ]
Montresori, Silvio [1 ]
Thomas, Jean-Hugh [1 ]
机构
[1] Le Mans Univ, Inst Claude Chappe, LIUM, Le Mans, France
[2] Le Mans Univ, LAUM IA GS UMR CNRS 6613, Le Mans, France
来源
关键词
speaker diarization; distant speech; multimicrophone; explainable AI;
D O I
10.21437/Interspeech.2024-917
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker Diarization (SD) aims at grouping speech segments that belong to the same speaker. This task is required in many speech-processing applications, such as rich meeting transcription. In this context, distant microphone arrays usually capture the audio signal. Beamforming, i.e., spatial filtering, is a common practice to process multi-microphone audio data. However, it often requires an explicit localization of the active source to steer the filter. This paper proposes a self-attention-based algorithm to select the output of a bank of fixed spatial filters. This method serves as a feature extractor for joint Voice Activity (VAD) and Overlapped Speech Detection (OSD). The speaker diarization is then inferred from the detected segments. The approach shows convincing distant VAD, OSD, and SD performance, e.g. 14.5% DER on the AISHELL-4 dataset. The analysis of the self-attention weights demonstrates their explainability, as they correlate with the speaker's angular locations.
引用
收藏
页码:1620 / 1624
页数:5
相关论文
共 50 条
  • [41] Robust Speaker Diarization for Meetings: ICSI RT06s evaluation system
    Anguera, Xavier
    Wooters, Chuck
    Pardo, Jose M.
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1674 - 1677
  • [42] Automatic cluster complexity and quantity selection: Towards robust speaker diarization
    Anguera, Xavier
    Wooters, Chuck
    Hernando, Javier
    MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2006, 4299 : 248 - +
  • [43] DiarTk : An Open Source Toolkit for Research in Multistream Speaker Diarization and its Application to Meetings Recordings
    Vijayasenan, Deepu
    Valente, Fabio
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2167 - 2170
  • [44] Novel Clustering Selection Criterion for Fast Binary Key Speaker Diarization
    Delgado, Hector
    Anguera, Xavier
    Fredouille, Corinne
    Serrano, Javier
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3091 - 3095
  • [45] INFORMATION BOTTLENECK BASED SPEAKER DIARIZATION OF MEETINGS USING NON-SPEECH AS SIDE INFORMATION
    Yella, Sree Harsha
    Bourlard, Herve
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [46] MODELING AUDIO DIRECTIONAL STATISTICS USING A PROBABILISTIC SPATIAL DICTIONARY FOR SPEAKER DIARIZATION IN REAL MEETINGS
    Fakhry, Mahmoud
    Ito, Nobutaka
    Araki, Shoko
    Nakatani, Tomohiro
    2016 IEEE INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2016,
  • [47] Speaker diarization for multi-microphone meetings using only between-channel differences
    Pardo, Jose M.
    Anguera, Xavier
    Wooters, Chuck
    MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2006, 4299 : 257 - +
  • [48] Factor analysis-based approaches applied to the speaker diarization task of meetings: a preliminary study
    Tomasek, Pavel
    Fredouille, Corinne
    Matrouf, Driss
    ODYSSEY 2010: THE SPEAKER AND LANGUAGE RECOGNITION WORKSHOP, 2010, : 131 - 137
  • [49] INCREMENTAL TRANSFER LEARNING IN TWO-PASS INFORMATION BOTTLENECK BASED SPEAKER DIARIZATION SYSTEM FOR MEETINGS
    Dawalatabad, Nauman
    Madikeri, Srikanth
    Sekhar, C. Chandra
    Murthy, Hema A.
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6291 - 6295
  • [50] INTEGRATION OF SPEECH SEPARATION, DIARIZATION, AND RECOGNITION FOR MULTI-SPEAKER MEETINGS: SYSTEM DESCRIPTION, COMPARISON, AND ANALYSIS
    Raj, Desh
    Denisov, Pavel
    Chen, Zhuo
    Erdogan, Hakan
    Huang, Zili
    He, Maokui
    Watanabe, Shinji
    Du, Jun
    Yoshioka, Takuya
    Luo, Yi
    Kanda, Naoyuki
    Li, Jinyu
    Wisdom, Scott
    Hershey, John R.
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 897 - 904