Similarity Measurement of Segment-Level Speaker Embeddings in Speaker Diarization

被引:6
|
作者
Wang, Weiqing [1 ]
Lin, Qingjian [2 ]
Cai, Danwei [1 ]
Li, Ming [1 ,2 ]
机构
[1] Duke Univ, Dept Elect & Comp Engn, Durham, NC 27708 USA
[2] Duke Kunshan Univ, Data Sci Res Ctr, Suzhou 215316, Peoples R China
关键词
Feature extraction; Voice activity detection; Acoustics; Task analysis; Data mining; Training; Aggregates; Speaker diarization; speaker verification; target-speaker voice activity detection; SPEECH;
D O I
10.1109/TASLP.2022.3196178
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose a neural-network-based similarity measurement method to learn the similarity between any two speaker embeddings, where both previous and future contexts are considered. Moreover, we propose the segmental pooling strategy and jointly train the speaker embedding network along with the similarity measurement model. Later, this joint training framework is further extended to the target-speaker voice activity detection (TS-VAD), with only slight modification in the network architecture. Experimental results of the DIHARD II, DIHARD III and VoxConverse datasets show that our clustering-based system with the neural similarity measurement achieves superior performance to recent approaches on all three datasets. In addition, the segment-level TS-VAD method further improves the clustering-based results and achieves DER of 16.48%, 11.62% and 4.39% on the DIHARD II, DIHARD III and VoxConverse datasets, respectively.
引用
收藏
页码:2645 / 2658
页数:14
相关论文
共 50 条
  • [1] SPEAKER DIARIZATION THROUGH SPEAKER EMBEDDINGS
    Rouvier, Mickael
    Bousquet, Pierre-Michel
    Favre, Benoit
    2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2082 - 2086
  • [2] Speaker-Corrupted Embeddings for Online Speaker Diarization
    Ghahabi, Omid
    Fischer, Volker
    INTERSPEECH 2019, 2019, : 386 - 390
  • [3] Speaker Diarization with Deep Speaker Embeddings for DIHARD Challenge II
    Novoselov, Sergey
    Gusev, Aleksei
    Ivanov, Artem
    Pekhovsky, Timur
    Shulipa, Andrey
    Avdeeva, Anastasia
    Gorlanov, Artem
    Kozlov, Alexandr
    INTERSPEECH 2019, 2019, : 1003 - 1007
  • [4] INVESTIGATION OF SPEAKER EMBEDDINGS FOR CROSS-SHOW SPEAKER DIARIZATION
    Rouvier, Mickael
    Favre, Benoit
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5585 - 5589
  • [5] Comparison of low-dimension speech segment embeddings: Application to speaker diarization
    Chetupalli, Srikanth Raj
    Thippur, Sreenivas, V
    Gopalakrishnan, Anand
    2019 25TH NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2019,
  • [6] LSTM based Similarity Measurement with Spectral Clustering for Speaker Diarization
    Lin, Qingjian
    Yin, Ruiqing
    Li, Ming
    Bredin, Herve
    Barras, Claude
    INTERSPEECH 2019, 2019, : 366 - 370
  • [7] Self-Attentive Similarity Measurement Strategies in Speaker Diarization
    Lin, Qingjian
    Hou, Yu
    Li, Ming
    INTERSPEECH 2020, 2020, : 284 - 288
  • [8] ECAPA-TDNN Embeddings for Speaker Diarization
    Dawalatabad, Nauman
    Ravanelli, Mirco
    Grondin, Francois
    Thienpondt, Jenthe
    Desplanques, Brecht
    Na, Hwidong
    INTERSPEECH 2021, 2021, : 3560 - 3564
  • [9] SPEAKER EMBEDDINGS INCORPORATING ACOUSTIC CONDITIONS FOR DIARIZATION
    Higuchi, Yosuke
    Suzuki, Masayuki
    Kurata, Gakuto
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7129 - 7133
  • [10] SiamTDNN: Enhancing Discriminative Embeddings for Speaker Diarization
    Zhang, Runqing
    Lu, Huijun
    Cai, Dunbo
    Huang, Zhiguo
    Du, Yujian
    Qian, Ling
    Zhang, Yijun
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2024, 33 (03)