SPEAKER EMBEDDINGS INCORPORATING ACOUSTIC CONDITIONS FOR DIARIZATION

被引:0
|
作者
Higuchi, Yosuke [1 ,2 ]
Suzuki, Masayuki [1 ]
Kurata, Gakuto [1 ]
机构
[1] IBM Res AI, Tokyo, Japan
[2] Waseda Univ, Dept Commun & Comp Engn, Tokyo, Japan
关键词
speaker embedding; speaker diarization; representation learning; neural network;
D O I
10.1109/icassp40776.2020.9054273
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present our work on training speaker embeddings, especially effective for speaker diarization. For various speaker recognition tasks, extracting speaker embeddings using Deep Neural Networks (DNNs) has become major methods. These embeddings are generally trained to be discriminate speakers and be robust with respect to different acoustic conditions. In speaker diarization, however, the acoustic conditions can be used as consistent information for discriminating speakers. Such information can include the distances to a microphone in a meeting, or the channels for each speaker in telephone conversation recorded in monaural. Hence, the proposed speaker-embedding network leverages differences in acoustic conditions to train effective speaker embeddings for speaker diarization. The information on acoustic conditions can be anything that contributes to distinguishing between recording environments; for example, we explore using i-vectors. Experiments conducted on a practical diarization system demonstrated that the proposed embeddings significantly improve performance over embeddings without information on acoustic conditions.
引用
收藏
页码:7129 / 7133
页数:5
相关论文
共 50 条
  • [1] SPEAKER DIARIZATION THROUGH SPEAKER EMBEDDINGS
    Rouvier, Mickael
    Bousquet, Pierre-Michel
    Favre, Benoit
    2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2082 - 2086
  • [2] Speaker-Corrupted Embeddings for Online Speaker Diarization
    Ghahabi, Omid
    Fischer, Volker
    INTERSPEECH 2019, 2019, : 386 - 390
  • [3] Speaker Diarization with Deep Speaker Embeddings for DIHARD Challenge II
    Novoselov, Sergey
    Gusev, Aleksei
    Ivanov, Artem
    Pekhovsky, Timur
    Shulipa, Andrey
    Avdeeva, Anastasia
    Gorlanov, Artem
    Kozlov, Alexandr
    INTERSPEECH 2019, 2019, : 1003 - 1007
  • [4] INVESTIGATION OF SPEAKER EMBEDDINGS FOR CROSS-SHOW SPEAKER DIARIZATION
    Rouvier, Mickael
    Favre, Benoit
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5585 - 5589
  • [5] ECAPA-TDNN Embeddings for Speaker Diarization
    Dawalatabad, Nauman
    Ravanelli, Mirco
    Grondin, Francois
    Thienpondt, Jenthe
    Desplanques, Brecht
    Na, Hwidong
    INTERSPEECH 2021, 2021, : 3560 - 3564
  • [6] SiamTDNN: Enhancing Discriminative Embeddings for Speaker Diarization
    Zhang, Runqing
    Lu, Huijun
    Cai, Dunbo
    Huang, Zhiguo
    Du, Yujian
    Qian, Ling
    Zhang, Yijun
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2024, 33 (03)
  • [7] Deep Speaker Embeddings Based Online Diarization
    Avdeeva, Anastasia
    Novoselov, Sergey
    SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 24 - 32
  • [8] Unsupervised deep feature embeddings for speaker diarization
    Ahmad, Rehan
    Zubair, Syed
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (04) : 3138 - 3149
  • [9] Similarity Measurement of Segment-Level Speaker Embeddings in Speaker Diarization
    Wang, Weiqing
    Lin, Qingjian
    Cai, Danwei
    Li, Ming
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2645 - 2658
  • [10] Acoustic beamforming for speaker diarization of meetings
    Anguera, Xavier
    Wooters, Chuck
    Hernando, Javier
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (07): : 2011 - 2022