FREQUENCY AND MULTI-SCALE SELECTIVE KERNEL ATTENTION FOR SPEAKER VERIFICATION

被引:9
|
作者
Mun, Sung Hwan [1 ]
Jung, Jee-Weon [2 ]
Han, Min Hyun [1 ]
Kim, Nam Soo [1 ]
机构
[1] Seoul Natl Univ, Dept ECE & INMC, Seoul, South Korea
[2] Naver Corp, Seongnam Si, South Korea
关键词
speaker verification; selective kernel attention; multi-scale module;
D O I
10.1109/SLT54892.2023.10023305
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The majority of recent state-of-the-art speaker verification architectures adopt multi-scale processing and frequency-channel attention mechanisms. Convolutional layers of these models typically have a fixed kernel size, e.g., 3 or 5. In this study, we further contribute to this line of research utilising a selective kernel attention (SKA) mechanism. The SKA mechanism allows each convolutional layer to adaptively select the kernel size in a data-driven fashion. It is based on an attention mechanism which exploits both frequency and channel domain. We first apply existing SKA module to our baseline. Then we propose two SKA variants where the first variant is applied in front of the ECAPA-TDNN model and the other is combined with the Res2net backbone block. Through extensive experiments, we demonstrate that our two proposed SKA variants consistently improves the performance and are complementary when tested on three different evaluation protocols.
引用
收藏
页码:548 / 554
页数:7
相关论文
共 50 条
  • [31] Multi-scale large kernel convolution and hybrid attention network for remote sensing image dehazing
    Su, Hang
    Liu, Lina
    Wang, Zenghui
    Gao, Mingliang
    IMAGE AND VISION COMPUTING, 2024, 150
  • [32] MATTE: Multi-task multi-scale attention
    Strezoski, Gjorgji
    van Noord, Nanne
    Worring, Marcel
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 228
  • [33] Multiple Kernel Learning for speaker verification
    Longworth, C.
    Gales, M. J. F.
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 1581 - 1584
  • [34] Attention-Based Temporal-Frequency Aggregation for Speaker Verification
    Wang, Meng
    Feng, Dazheng
    Su, Tingting
    Chen, Mohan
    SENSORS, 2022, 22 (06)
  • [35] A Generalised Derivative Kernel for Speaker Verification
    Longworth, C.
    Gales, M. J. F.
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1381 - 1384
  • [36] Multi-Scale Feature Attention-DEtection TRansformer: Multi-Scale Feature Attention for security check object detection
    Sima, Haifeng
    Chen, Bailiang
    Tang, Chaosheng
    Zhang, Yudong
    Sun, Junding
    IET COMPUTER VISION, 2024, 18 (05) : 613 - 625
  • [37] SSGNet: Selective Multi-Scale Receptive Field and Kernel Self-Attention Based on Group-Wise Modality for Brain Tumor Segmentation
    Guo, Bin
    Cao, Ning
    Yang, Peng
    Zhang, Ruihao
    ELECTRONICS, 2024, 13 (10)
  • [38] MSTrack: Visual Tracking with Multi-scale Attention
    Song, Chunlin
    Yao, Yu
    Guo, Jianhui
    Li, Lunbo
    PROCEEDINGS OF 2024 INTERNATIONAL CONFERENCE ON COMPUTER AND MULTIMEDIA TECHNOLOGY, ICCMT 2024, 2024, : 337 - 344
  • [39] Multi-Scale Attention Network for Image Cropping
    Lian, Tianpei
    Xian, Ke
    Pan, Zhiyu
    Hong, Chaoyi
    Cao, Zhiguo
    Zhong, Weicai
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 2640 - 2645
  • [40] Multi-scale Refocusing Attention Siamese Network
    Liu, Guoqiang
    Chen, Zhe
    Shen, Guangze
    2024 5TH INTERNATIONAL CONFERENCE ON GEOLOGY, MAPPING AND REMOTE SENSING, ICGMRS 2024, 2024, : 42 - 46