END-TO-END DIARIZATION FOR VARIABLE NUMBER OF SPEAKERS WITH LOCAL-GLOBAL NETWORKS AND DISCRIMINATIVE SPEAKER EMBEDDINGS

被引:13
|
作者
Maiti, Soumi [1 ,4 ]
Erdogan, Hakan [2 ]
Wilson, Kevin [2 ]
Wisdom, Scott [2 ]
Watanabe, Shinji [3 ]
Hershey, John R. [2 ]
机构
[1] CUNY, Grad Ctr, New York, NY 10010 USA
[2] Google Res, Mountain View, CA USA
[3] Johns Hopkins Univ, Baltimore, MD 21218 USA
[4] Google, Mountain View, CA 94043 USA
关键词
Diarization; attention; deep learning;
D O I
10.1109/ICASSP39728.2021.9414841
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present an end-to-end deep network model that performs meeting diarization from single-channel audio recordings. End-to-end diarization models have the advantage of handling speaker overlap and enabling straightforward handling of discriminative training, unlike traditional clustering-based diarization methods. The proposed system is designed to handle meetings with unknown numbers of speakers, using variable-number permutation-invariant cross-entropy based loss functions. We introduce several components that appear to help with diarization performance, including a local convolutional network followed by a global self-attention module, multi-task transfer learning using a speaker identification component, and a sequential approach where the model is refined with a second stage. These are trained and validated on simulated meeting data based on LibriSpeech and LibriTTS datasets; final evaluations are done using LibriCSS, which consists of simulated meetings recorded using real acoustics via loudspeaker playback. The proposed model performs better than previously proposed end-to-end diarization models on these data.
引用
收藏
页码:7183 / 7187
页数:5
相关论文
共 50 条
  • [21] End-to-end recurrent denoising autoencoder embeddings for speaker identification
    Esther Rituerto-González
    Carmen Peláez-Moreno
    Neural Computing and Applications, 2021, 33 : 14429 - 14439
  • [22] End-to-end recurrent denoising autoencoder embeddings for speaker identification
    Rituerto-Gonzalez, Esther
    Pelaez-Moreno, Carmen
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (21): : 14429 - 14439
  • [23] CONTINUAL SELF-SUPERVISED DOMAIN ADAPTATION FOR END-TO-END SPEAKER DIARIZATION
    Coria, Juan M.
    Bredin, Herve
    Ghannay, Sahar
    Rosset, Sophie
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 626 - 632
  • [24] OVERLAP-AWARE LOW-LATENCY ONLINE SPEAKER DIARIZATION BASED ON END-TO-END LOCAL SEGMENTATION
    Coria, Juan M.
    Bredin, Herve
    Ghannay, Sahar
    Rosset, Sophie
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 1139 - 1146
  • [25] Shortcut Connections based Deep Speaker Embeddings for End-to-End Speaker Verification System
    Seo, Soonshin
    Rim, Daniel Jun
    Lim, Minkyu
    Lee, Donghyun
    Park, Hosung
    Oh, Junseok
    Kim, Changmin
    Kim, Ji-Hwan
    INTERSPEECH 2019, 2019, : 2928 - 2932
  • [26] DEEP NEURAL NETWORK-BASED SPEAKER EMBEDDINGS FOR END-TO-END SPEAKER VERIFICATION
    Snyder, David
    Ghahremani, Pegah
    Povey, Daniel
    Garcia-Romero, Daniel
    Carmiel, Yishay
    Khudanpur, Sanjeev
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 165 - 170
  • [27] Self-Conditioning via Intermediate Predictions for End-to-End Neural Speaker Diarization
    Fujita, Yusuke
    Ogawa, Tetsuji
    Kobayashi, Tetsunori
    IEEE ACCESS, 2023, 11 : 140069 - 140076
  • [28] A study on end-to-end speaker diarization system using single-label classification
    Jung, Jaehee
    Kim, Wooil
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2023, 42 (06): : 536 - 543
  • [29] End-to-End Multi-Speaker Speech Recognition using Speaker Embeddings and Transfer Learning
    Denisov, Pavel
    Ngoc Thang Vu
    INTERSPEECH 2019, 2019, : 4425 - 4429
  • [30] Online Streaming End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers
    Xue, Yawen
    Horiguchi, Shota
    Fujita, Yusuke
    Takashima, Yuki
    Watanabe, Shinji
    Garcia, Paola
    Nagamatsu, Kenji
    INTERSPEECH 2021, 2021, : 3116 - 3120