SPEAKER EMBEDDINGS INCORPORATING ACOUSTIC CONDITIONS FOR DIARIZATION

被引:0
|
作者
Higuchi, Yosuke [1 ,2 ]
Suzuki, Masayuki [1 ]
Kurata, Gakuto [1 ]
机构
[1] IBM Res AI, Tokyo, Japan
[2] Waseda Univ, Dept Commun & Comp Engn, Tokyo, Japan
关键词
speaker embedding; speaker diarization; representation learning; neural network;
D O I
10.1109/icassp40776.2020.9054273
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present our work on training speaker embeddings, especially effective for speaker diarization. For various speaker recognition tasks, extracting speaker embeddings using Deep Neural Networks (DNNs) has become major methods. These embeddings are generally trained to be discriminate speakers and be robust with respect to different acoustic conditions. In speaker diarization, however, the acoustic conditions can be used as consistent information for discriminating speakers. Such information can include the distances to a microphone in a meeting, or the channels for each speaker in telephone conversation recorded in monaural. Hence, the proposed speaker-embedding network leverages differences in acoustic conditions to train effective speaker embeddings for speaker diarization. The information on acoustic conditions can be anything that contributes to distinguishing between recording environments; for example, we explore using i-vectors. Experiments conducted on a practical diarization system demonstrated that the proposed embeddings significantly improve performance over embeddings without information on acoustic conditions.
引用
收藏
页码:7129 / 7133
页数:5
相关论文
共 50 条
  • [11] SPEAKER DIARIZATION USING DEEP NEURAL NETWORK EMBEDDINGS
    Garcia-Romero, Daniel
    Snyder, David
    Sell, Gregory
    Povey, Daniel
    McCree, Alan
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4930 - 4934
  • [12] SPEAKER EMBEDDINGS FOR DIARIZATION OF BROADCAST DATA IN THE ALLIES CHALLENGE
    Larcher, Anthony
    Mehrish, Ambuj
    Tahon, Marie
    Meignier, Sylvain
    Carrive, Jean
    Doukhan, David
    Galibert, Olivier
    Evans, Nicholas
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5799 - 5803
  • [13] Speaker Diarization Using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings
    Cyrta, Pawel
    Trzcinski, Tomasz
    Stokowiec, Wojciech
    INFORMATION SYSTEMS ARCHITECTURE AND TECHNOLOGY, PT I, 2018, 655 : 107 - 117
  • [14] INCORPORATING PRIOR KNOWLEDGE INTO SPEAKER DIARIZATION AND LINKING FOR IDENTIFYING COMMON SPEAKER
    Leung, Tsun-Yat
    Samarakoon, Lahiru
    Lam, Albert Y. S.
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 697 - 703
  • [15] Speaker Diarization Using a priori Acoustic Information
    Aronowitz, Hagai
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 944 - 947
  • [16] Enhancing Speaker Diarization with Deep Neural Network Embeddings and Spectral Clustering
    Yanshan University, China
  • [17] FEATURE MAPPING FOR SPEAKER DIARIZATION IN NOISY CONDITIONS
    Zhu, Weixin
    Guo, Wu
    Hu, Guoping
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5445 - 5449
  • [18] Comparison of low-dimension speech segment embeddings: Application to speaker diarization
    Chetupalli, Srikanth Raj
    Thippur, Sreenivas, V
    Gopalakrishnan, Anand
    2019 25TH NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2019,
  • [19] TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings
    Boeddeker, Christoph
    Subramanian, Aswin Shanmugam
    Wichern, Gordon
    Haeb-Umbach, Reinhold
    Le Roux, Jonathan
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1185 - 1197
  • [20] Robust acoustic domain identification with its application to speaker diarization
    Kumar A.K.
    Waldekar S.
    Sahidullah M.
    Saha G.
    International Journal of Speech Technology, 2022, 25 (04) : 933 - 945