System output combination for improved speaker diarization

被引:0
|
作者
Bozonnet, Simon [1 ]
Evans, Nicholas [1 ]
Anguera, Xavier [2 ]
Vinyals, Oriol
Friedland, Gerald [3 ]
Fredouille, Corinne [4 ]
机构
[1] EURECOM, Sophia Antipolis, France
[2] Telefon Res, Barcelona, Spain
[3] Univ Calif, ICSI, Berkeley, CA USA
[4] Univ Avignon, LIA, Avignon, France
关键词
speaker diarization; system combination; fusion; FEATURES;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
System combination or fusion is a popular, successful and sometimes straightforward means of improving performance in many fields of statistical pattern classification, including speech and speaker recognition. Whilst there is significant work in the literature which aims to improve speaker diarization performance by combining multiple feature streams, there is little work which aims to combine the outputs of multiple systems. This paper reports our first attempts to combine the outputs of two state-of-the-art speaker diarization systems, namely ICSI's bottom-up and LIA-EURECOM's top-down systems. We show that a cluster matching procedure reliably identifies corresponding speaker clusters in the two system outputs and that, when they are used in a new realignment and resegmentation stage, the combination leads to relative improvements of 13% and 7% DER on independent development and evaluation sets.
引用
收藏
页码:2650 / +
页数:2
相关论文
共 50 条
  • [21] Developing On-Line Speaker Diarization System
    Dimitriadis, Dimitrios
    Fousek, Petr
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2739 - 2743
  • [22] Multiple feature combination to improve speaker diarization of telephone conversations
    Gupta, Vishwa
    Kenny, Patrick
    Ouellet, Pierre
    Boulianne, Gilles
    Dumouchel, Pierre
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 705 - 710
  • [23] The X-Lance Speaker Diarization System for the Conversational Short-phrase Speaker Diarization Challenge 2022
    Liu, Tao
    Xiang, Xu
    Chen, Zhengyang
    Han, Bing
    Yu, Kai
    Qian, Yanmin
    2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 498 - 501
  • [24] An Information Theoretic Combination of MFCC and TDOA Features for Speaker Diarization
    Vijayasenan, Deepu
    Valente, Fabio
    Bourlard, Herve
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (02): : 431 - 438
  • [25] Statistical Speaker Diarization Using Dependent Combination of Extracted Features
    Almgotir-Kadhimi, Hasan
    Woo, Lok
    Dlay, Satnam
    2015 THIRD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, MODELLING AND SIMULATION (AIMS 2015), 2015, : 291 - 296
  • [26] The BUCEA Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 2022
    Zhou, Ruohua
    Du, Yuxuan
    Hu, Chenlei
    arXiv, 2022,
  • [27] MICROSOFT SPEAKER DIARIZATION SYSTEM FOR THE VOXCELEB SPEAKER RECOGNITION CHALLENGE 2020
    Xiao, Xiong
    Kanda, Naoyuki
    Chen, Zhuo
    Zhou, Tianyan
    Yoshioka, Takuya
    Chen, Sanyuan
    Zhao, Yong
    Liu, Gang
    Wu, Yu
    Wu, Jian
    Liu, Shujie
    Li, Jinyu
    Gong, Yifan
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5824 - 5828
  • [28] Ideas for Clustering of Similar Models of a Speaker in an Online Speaker Diarization System
    Kunesova, Marie
    Radova, Vlasta
    TEXT, SPEECH, AND DIALOGUE (TSD 2015), 2015, 9302 : 225 - 233
  • [29] Incorporation of the ASR Output in Speaker Segmentation and Clustering within the Task of Speaker Diarization of Broadcast Streams
    Silovsky, Jan
    Zdansky, Jindrich
    Nouza, Jan
    Cerva, Petr
    Prazak, Jan
    2012 IEEE 14TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2012, : 118 - 123
  • [30] Improved Novelty Detection for Online GMM based Speaker Diarization
    Markov, Konstantin
    Nakamura, Satoshi
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 363 - 366