SPEAKER EMBEDDINGS INCORPORATING ACOUSTIC CONDITIONS FOR DIARIZATION

被引:0
|
作者
Higuchi, Yosuke [1 ,2 ]
Suzuki, Masayuki [1 ]
Kurata, Gakuto [1 ]
机构
[1] IBM Res AI, Tokyo, Japan
[2] Waseda Univ, Dept Commun & Comp Engn, Tokyo, Japan
关键词
speaker embedding; speaker diarization; representation learning; neural network;
D O I
10.1109/icassp40776.2020.9054273
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present our work on training speaker embeddings, especially effective for speaker diarization. For various speaker recognition tasks, extracting speaker embeddings using Deep Neural Networks (DNNs) has become major methods. These embeddings are generally trained to be discriminate speakers and be robust with respect to different acoustic conditions. In speaker diarization, however, the acoustic conditions can be used as consistent information for discriminating speakers. Such information can include the distances to a microphone in a meeting, or the channels for each speaker in telephone conversation recorded in monaural. Hence, the proposed speaker-embedding network leverages differences in acoustic conditions to train effective speaker embeddings for speaker diarization. The information on acoustic conditions can be anything that contributes to distinguishing between recording environments; for example, we explore using i-vectors. Experiments conducted on a practical diarization system demonstrated that the proposed embeddings significantly improve performance over embeddings without information on acoustic conditions.
引用
收藏
页码:7129 / 7133
页数:5
相关论文
共 50 条
  • [21] MULTISTREAM SPEAKER DIARIZATION BEYOND TWO ACOUSTIC FEATURE STREAMS
    Vijayasenan, Deepu
    Valente, Fabio
    Bourlard, Herve
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4950 - 4953
  • [22] Speaker diarization for multi-party meetings using acoustic fusion
    Anguera, X
    Wooters, C
    Hernando, J
    2005 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2005, : 426 - 431
  • [23] Automatic weighting for the combination of TDOA and acoustic features in speaker diarization for meetings
    Anguera, Xavier
    Wooters, Chuck
    Pardo, Jose M.
    Hernando, Javier
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 241 - +
  • [24] Multimodal Speaker Diarization
    Noulas, Athanasios
    Englebienne, Gwenn
    Krose, Ben J. A.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (01) : 79 - 93
  • [25] SPEAKER DIARIZATION WITH LSTM
    Wang, Quan
    Downey, Carlton
    Wan, Li
    Mansfield, Philip Andrew
    Moreno, Ignacio Lopez
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5239 - 5243
  • [26] Using Direction of Arrival Estimate and Acoustic Feature Information in Speaker Diarization
    Koh, Eugene Chin Wei
    Sun, Hanwu
    Nwe, Tin Lay
    Nguyen, Trung Hieu
    Ma, Bin
    Chng, Eng-Siong
    Li, Haizhou
    Rahardja, Susanto
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2181 - +
  • [27] Dialocalization: Acoustic Speaker Diarization and Visual Localization as Joint Optimization Problem
    Friedland, Gerald
    Yeo, Chuohao
    Hung, Hayley
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2010, 6 (04)
  • [28] Trainable Speaker Diarization
    Aronowitz, Hagai
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2021 - 2024
  • [29] Speaker Diarization using Leave-one-out Gaussian PLDA Clustering of DNN Embeddings
    McCree, Alan
    Sell, Gregory
    Garcia-Romero, Daniel
    INTERSPEECH 2019, 2019, : 381 - 385
  • [30] SIMULTANEOUS SPEECH RECOGNITION AND SPEAKER DIARIZATION FOR MONAURAL DIALOGUE RECORDINGS WITH TARGET-SPEAKER ACOUSTIC MODELS
    Kanda, Naoyuki
    Horiguchi, Shota
    Fujita, Yusuke
    Xue, Yawen
    Nagamatsu, Kenji
    Watanabe, Shinji
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 31 - 38