FREQUENCY AND MULTI-SCALE SELECTIVE KERNEL ATTENTION FOR SPEAKER VERIFICATION

被引：9

作者：

Mun, Sung Hwan ^{[1
]}

Jung, Jee-Weon ^{[2
]}

Han, Min Hyun ^{[1
]}

Kim, Nam Soo ^{[1
]}

机构：

[1] Seoul Natl Univ, Dept ECE & INMC, Seoul, South Korea

[2] Naver Corp, Seongnam Si, South Korea

来源：

2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT | 2022年

关键词：

speaker verification; selective kernel attention; multi-scale module;

D O I：

10.1109/SLT54892.2023.10023305

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The majority of recent state-of-the-art speaker verification architectures adopt multi-scale processing and frequency-channel attention mechanisms. Convolutional layers of these models typically have a fixed kernel size, e.g., 3 or 5. In this study, we further contribute to this line of research utilising a selective kernel attention (SKA) mechanism. The SKA mechanism allows each convolutional layer to adaptively select the kernel size in a data-driven fashion. It is based on an attention mechanism which exploits both frequency and channel domain. We first apply existing SKA module to our baseline. Then we propose two SKA variants where the first variant is applied in front of the ECAPA-TDNN model and the other is combined with the Res2net backbone block. Through extensive experiments, we demonstrate that our two proposed SKA variants consistently improves the performance and are complementary when tested on three different evaluation protocols.

引用

页码：548 / 554

页数：7

共 50 条

[31] Multi-scale large kernel convolution and hybrid attention network for remote sensing image dehazing
Su, Hang
Liu, Lina
Wang, Zenghui
Gao, Mingliang
IMAGE AND VISION COMPUTING, 2024, 150
[32] MATTE: Multi-task multi-scale attention
Strezoski, Gjorgji
van Noord, Nanne
Worring, Marcel
COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 228
[33] Multiple Kernel Learning for speaker verification
Longworth, C.
Gales, M. J. F.
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 1581 - 1584
[34] Attention-Based Temporal-Frequency Aggregation for Speaker Verification
Wang, Meng
Feng, Dazheng
Su, Tingting
Chen, Mohan
SENSORS, 2022, 22 (06)
[35] A Generalised Derivative Kernel for Speaker Verification
Longworth, C.
Gales, M. J. F.
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1381 - 1384
[36] Multi-Scale Feature Attention-DEtection TRansformer: Multi-Scale Feature Attention for security check object detection
Sima, Haifeng
Chen, Bailiang
Tang, Chaosheng
Zhang, Yudong
Sun, Junding
IET COMPUTER VISION, 2024, 18 (05) : 613 - 625
[37] SSGNet: Selective Multi-Scale Receptive Field and Kernel Self-Attention Based on Group-Wise Modality for Brain Tumor Segmentation
Guo, Bin
Cao, Ning
Yang, Peng
Zhang, Ruihao
ELECTRONICS, 2024, 13 (10)
[38] MSTrack: Visual Tracking with Multi-scale Attention
Song, Chunlin
Yao, Yu
Guo, Jianhui
Li, Lunbo
PROCEEDINGS OF 2024 INTERNATIONAL CONFERENCE ON COMPUTER AND MULTIMEDIA TECHNOLOGY, ICCMT 2024, 2024, : 337 - 344
[39] Multi-Scale Attention Network for Image Cropping
Lian, Tianpei
Xian, Ke
Pan, Zhiyu
Hong, Chaoyi
Cao, Zhiguo
Zhong, Weicai
2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 2640 - 2645
[40] Multi-scale Refocusing Attention Siamese Network
Liu, Guoqiang
Chen, Zhe
Shen, Guangze
2024 5TH INTERNATIONAL CONFERENCE ON GEOLOGY, MAPPING AND REMOTE SENSING, ICGMRS 2024, 2024, : 42 - 46

← 1 2 3 4 5 →