FREQUENCY AND MULTI-SCALE SELECTIVE KERNEL ATTENTION FOR SPEAKER VERIFICATION

被引：9

作者：

Mun, Sung Hwan ^{[1
]}

Jung, Jee-Weon ^{[2
]}

Han, Min Hyun ^{[1
]}

Kim, Nam Soo ^{[1
]}

机构：

[1] Seoul Natl Univ, Dept ECE & INMC, Seoul, South Korea

[2] Naver Corp, Seongnam Si, South Korea

来源：

2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT | 2022年

关键词：

speaker verification; selective kernel attention; multi-scale module;

D O I：

10.1109/SLT54892.2023.10023305

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The majority of recent state-of-the-art speaker verification architectures adopt multi-scale processing and frequency-channel attention mechanisms. Convolutional layers of these models typically have a fixed kernel size, e.g., 3 or 5. In this study, we further contribute to this line of research utilising a selective kernel attention (SKA) mechanism. The SKA mechanism allows each convolutional layer to adaptively select the kernel size in a data-driven fashion. It is based on an attention mechanism which exploits both frequency and channel domain. We first apply existing SKA module to our baseline. Then we propose two SKA variants where the first variant is applied in front of the ECAPA-TDNN model and the other is combined with the Res2net backbone block. Through extensive experiments, we demonstrate that our two proposed SKA variants consistently improves the performance and are complementary when tested on three different evaluation protocols.

引用

页码：548 / 554

页数：7

共 50 条

[1] MFA: TDNN WITH MULTI-SCALE FREQUENCY-CHANNEL ATTENTION FOR TEXT-INDEPENDENT SPEAKER VERIFICATION WITH SHORT UTTERANCES
Liu, Tianchi
Das, Rohan Kumar
Lee, Kong Aik
Li, Haizhou
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7517 - 7521
[2] HybridTrajNet: Enhancing Pedestrian Trajectory Prediction with Multi-Scale Attention and Large Selective Kernel
Wang, Xiaoke
Li, Wenzao
Tang, Ran
Zhang, Xiaoming
2024 3RD INTERNATIONAL JOINT CONFERENCE ON INFORMATION AND COMMUNICATION ENGINEERING, JCICE 2024, 2024, : 129 - 133
[3] MULTI-SCALE SPEAKER EMBEDDING-BASED GRAPH ATTENTION NETWORKS FOR SPEAKER DIARISATION
Kwon, Youngki
Heo, Hee-Soo
Jung, Jee-Weon
Kim, You Jin
Lee, Bong-Jin
Chung, Joon Son
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8367 - 8371
[4] Speaker verification using attentive multi-scale convolutional recurrent network
Li, Yanxiong
Jiang, Zhongjie
Cao, Wenchang
Huang, Qisheng
APPLIED SOFT COMPUTING, 2022, 126
[5] Rep-MCA-former: An efficient multi-scale convolution attention encoder for text-independent speaker verification
Liu, Xiaohu
Chen, Defu
Wang, Xianbao
Xiang, Sheng
Zhou, Xuwen
COMPUTER SPEECH AND LANGUAGE, 2024, 85
[6] Target Speaker Verification With Selective Auditory Attention for Single and Multi-Talker Speech
Xu, Chenglin
Rao, Wei
Wu, Jibin
Li, Haizhou
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2696 - 2709
[7] Using Segmentation With Multi-Scale Selective Kernel for Visual Object Tracking
Bao, Feng
Cao, Yifei
Zhang, Shunli
Lin, Beibei
Zhao, Sicong
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 553 - 557
[8] TMS: Temporal multi-scale in time-delay neural network for speaker verification
Zhang, Ruiteng
Wei, Jianguo
Lu, Xugang
Lu, Wenhuan
Jin, Di
Zhang, Lin
Xu, Junhai
Dang, Jianwu
APPLIED INTELLIGENCE, 2023, 53 (22) : 26497 - 26517
[9] TMS: Temporal multi-scale in time-delay neural network for speaker verification
Ruiteng Zhang
Jianguo Wei
Xugang Lu
Wenhuan Lu
Di Jin
Lin Zhang
Junhai Xu
Jianwu Dang
Applied Intelligence, 2023, 53 : 26497 - 26517
[10] NEXT-TDNN: MODERNIZING MULTI-SCALE TEMPORAL CONVOLUTION BACKBONE FOR SPEAKER VERIFICATION
He, Hyun-Jun
Shin, Ui-Hyeop
Lee, Ran
Cheon, YoungJu
Park, Hyung-Min
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2024), 2024, : 11186 - 11190

← 1 2 3 4 5 →