Blueprint Separable Subsampling and Aggregate Feature Conformer-Based End-to-End Neural Diarization

被引：1

作者：

Jiao, Xiaolin ^{[1
]}

Chen, Yaqi ^{[2
]}

Qu, Dan ^{[2
]}

Yang, Xukui ^{[2
]}

机构：

[1] Zhengzhou Univ, Sch Cyber Sci & Engn, Zhengzhou 450001, Peoples R China

[2] Informat Engn Univ, Sch Informat Syst Engn, Zhengzhou 450001, Peoples R China

来源：

ELECTRONICS | 2023年 / 12卷 / 19期

基金：

中国国家自然科学基金;

关键词：

end-to-end neural diarization (EEND); blueprint separable convolution (BSConv); multi-scale feature aggregation (MFA); SPEAKER DIARIZATION; SEPARATION;

D O I：

10.3390/electronics12194118

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

At present, a prevalent approach to speaker diarization is clustering based on speaker embeddings. However, this method encounters two primary issues. Firstly, it cannot directly minimize the diarization error during the training process; secondly, the majority of clustering-based methods struggle to handle speaker overlap in audio. A viable approach for addressing these issues involves adopting end-to-end speaker diarization (EEND). Nevertheless, training this EEND system generally requires lengthy audio inputs, which must be downsampled to allow efficient model processing. In this study, we develop a novel downsampling layer using blueprint separable convolution (BSConv) instead of depthwise separable convolution (DSC) as the foundational convolutional unit, which effectively preserves information from the original audio. Furthermore, we incorporate multi-scale feature aggregation (MFA) into the encoder structure to combine the features extracted by each conformer block to the output layer, consequently enhancing the expressiveness of the model's feature extraction. Lastly, we employ the conformer as the backbone network to incorporate the proposed enhancements, resulting in an EEND system named BSAC-EEND. We assess our suggested methodology on both simulated and real datasets. The experiment indicates that our proposed EEND system reduces diarization error rate (DER) by an average of 17.3% for two-speaker datasets and 12.8% for three-speaker datasets compared to the baseline.

引用

页数：14

共 50 条

[41] FRNet: an end-to-end feature refinement neural network for medical image segmentation
Dan Wang
Guoqing Hu
Chengzhi Lyu
The Visual Computer, 2021, 37 : 1101 - 1112
[42] EEND-SS: JOINT END-TO-END NEURAL SPEAKER DIARIZATION AND SPEECH SEPARATION FOR FLEXIBLE NUMBER OF SPEAKERS
Maiti, Soumi
Ueda, Yushi
Watanabe, Shinji
Zhang, Chunlei
Yu, Meng
Zhang, Shi-Xiong
Xu, Yong
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 480 - 487
[43] End-to-end neural network based optimal quadcopter control
Ferede, Robin
de Croon, Guido
De Wagter, Christophe
Izzo, Dario
ROBOTICS AND AUTONOMOUS SYSTEMS, 2024, 172
[44] An End-to-End Compression Framework Based on Convolutional Neural Networks
Jiang, Feng
Tao, Wen
Liu, Shaohui
Ren, Jie
Guo, Xun
Zhao, Debin
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2018, 28 (10) : 3007 - 3018
[45] End-to-End Speech Emotion Recognition Based on Neural Network
Zhu, Bing
Zhou, Wenkai
Wang, Yutian
Wang, Hui
Cai, Juan Juan
2017 17TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT 2017), 2017, : 1634 - 1638
[46] An End-to-End Compression Framework Based on Convolutional Neural Networks
Tao, Wen
Jiang, Feng
Zhang, Shengping
Ren, Jie
Shi, Wuzhen
Zuo, Wangmeng
Guo, Xun
Zhao, Debin
2017 DATA COMPRESSION CONFERENCE (DCC), 2017, : 463 - 463
[47] End-to-End Neural Transformer Based Spoken Language Understanding
Radfar, Martin
Mouchtaris, Athanasios
Kunzmann, Siegfried
INTERSPEECH 2020, 2020, : 866 - 870
[48] END-TO-END NEURAL NETWORK BASED AUTOMATED SPEECH SCORING
Chen, Lei
Tao, Jidong
Ghaffarzadegan, Shabnam
Qian, Yao
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6234 - 6238
[49] An End-to-End Framework for Clothing Collocation Based on Semantic Feature Fusion
Zhao, Mingbo
Liu, Yu
Li, Xianrui
Zhang, Zhao
Zhang, Yue
IEEE MULTIMEDIA, 2020, 27 (04) : 122 - 132
[50] End-to-End Convolutional Neural Network Feature Extraction for Remote Sensed Images Classification
Alem, Abebaw
Kumar, Shailender
APPLIED ARTIFICIAL INTELLIGENCE, 2022, 36 (01)

← 1 2 3 4 5 →