Acoustic beamforming for speaker diarization of meetings

被引:277
|
作者
Anguera, Xavier [1 ]
Wooters, Chuck
Hernando, Javier
机构
[1] Telefon ID, Madrid 28043, Spain
[2] Univ Politecn Cataluna, E-08028 Barcelona, Spain
关键词
acoustic beamforming; meeting processing; speaker diarization; speaker segmentation and clustering;
D O I
10.1109/TASL.2007.902460
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
When performing speaker diarization on recordings from meetings, multiple microphones of different qualities are usually available and distributed around the meeting room. Although several approaches have been proposed in recent years to take advantage of multiple microphones, they are either too computationally expensive and not easily scalable or they cannot outperform the simpler case of using the best single microphone. In this paper, the use of classic acoustic beamforming techniques is proposed together with several novel algorithms to create a complete frontend for speaker diarization in the meeting room domain. New techniques we are presenting include blind reference-channel selection, two-step time delay of arrival (TDOA) Viterbi postprocessing, and a dynamic output signal weighting algorithm, together with using such TDOA values in the diarization to complement the acoustic information. Tests on speaker diarization show a 25% relative improvement on the test set compared to using a single most centrally located microphone. Additional experimental results show improvements using these techniques in a speech recognition task.
引用
收藏
页码:2011 / 2022
页数:12
相关论文
共 50 条
  • [21] Speaker Diarization for Multiple Distant Microphone Meetings: Mixing Acoustic Features And Inter-Channel Time Differences
    Pardo, Jose M.
    Anguera, Xavier
    Wooters, Chuck
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2194 - 2197
  • [22] MULTI-CHANNEL SPEAKER DIARIZATION USING SPATIAL FEATURES FOR MEETINGS
    Zheng, Naijun
    Li, Na
    Yu, JianWei
    Weng, Chao
    Su, Dan
    Liu, XunYing
    Meng, Helen
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7337 - 7341
  • [23] Novel Architectures for Unsupervised Information Bottleneck Based Speaker Diarization of Meetings
    Dawalatabad, Nauman
    Madikeri, Srikanth
    Sekhar, C. Chandra
    Murthy, Hema A.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 14 - 27
  • [24] Multistream speaker diarization of meetings recordings beyond MFCC and TDOA features
    Vijayasenan, Deepu
    Valente, Fabio
    Bourlard, Herve
    SPEECH COMMUNICATION, 2012, 54 (01) : 55 - 67
  • [25] MUTUAL INFORMATION BASED CHANNEL SELECTION FOR SPEAKER DIARIZATION OF MEETINGS DATA
    Vijayasenan, Deepu
    Valente, Fabio
    Bourlard, Herve
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4065 - 4068
  • [26] Investigating the Effect of Varying Window Sizes in Speaker Diarization for Meetings Domain
    Naik, Nirali
    Mankad, Sapan H.
    Thakkar, Priyank
    INFORMATION AND COMMUNICATION TECHNOLOGY FOR INTELLIGENT SYSTEMS (ICTIS 2017) - VOL 2, 2018, 84 : 361 - 369
  • [27] Information Bottleneck Features for HMM/GMM Speaker Diarization of Meetings Recordings
    Yella, Sree Harsha
    Valente, Fabio
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 960 - 963
  • [28] Estimating Dominance in Multi-Party Meetings Using Speaker Diarization
    Hung, Hayley
    Huang, Yan
    Friedland, Gerald
    Gatica-Perez, Daniel
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 847 - 860
  • [29] Robust speaker diarization for meetings: ICSI RT06S meetings evaluation system
    Anguera, Xavier
    Wooters, Chuck
    Pardo, Jose M.
    MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2006, 4299 : 346 - +
  • [30] ON THE EFFECT OF SNR AND SUPERDIRECTIVE BEAMFORMING IN SPEAKER DIARISATION IN MEETINGS
    Zwyssig, Erich
    Renals, Steve
    Lincoln, Mike
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4177 - 4180