FREQUENCY AND MULTI-SCALE SELECTIVE KERNEL ATTENTION FOR SPEAKER VERIFICATION

被引:9
|
作者
Mun, Sung Hwan [1 ]
Jung, Jee-Weon [2 ]
Han, Min Hyun [1 ]
Kim, Nam Soo [1 ]
机构
[1] Seoul Natl Univ, Dept ECE & INMC, Seoul, South Korea
[2] Naver Corp, Seongnam Si, South Korea
关键词
speaker verification; selective kernel attention; multi-scale module;
D O I
10.1109/SLT54892.2023.10023305
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The majority of recent state-of-the-art speaker verification architectures adopt multi-scale processing and frequency-channel attention mechanisms. Convolutional layers of these models typically have a fixed kernel size, e.g., 3 or 5. In this study, we further contribute to this line of research utilising a selective kernel attention (SKA) mechanism. The SKA mechanism allows each convolutional layer to adaptively select the kernel size in a data-driven fashion. It is based on an attention mechanism which exploits both frequency and channel domain. We first apply existing SKA module to our baseline. Then we propose two SKA variants where the first variant is applied in front of the ECAPA-TDNN model and the other is combined with the Res2net backbone block. Through extensive experiments, we demonstrate that our two proposed SKA variants consistently improves the performance and are complementary when tested on three different evaluation protocols.
引用
收藏
页码:548 / 554
页数:7
相关论文
共 50 条
  • [41] Dilated residual networks with multi-level attention for speaker verification
    Wu, Yanfeng
    Guo, Chenkai
    Gao, Hongcan
    Xu, Jing
    Bai, Guangdong
    NEUROCOMPUTING, 2020, 412 : 177 - 186
  • [42] A Multi-Scale Detector Based on Attention Mechanism
    Zhou, Lukuan
    Wang, Wei
    Wang, Qiang
    Sheng, Biyun
    Yang, Wankou
    2020 35TH YOUTH ACADEMIC ANNUAL CONFERENCE OF CHINESE ASSOCIATION OF AUTOMATION (YAC), 2020, : 110 - 115
  • [43] Multi-scale attention network for image inpainting
    Qin, Jia
    Bai, Huihui
    Zhao, Yao
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2021, 204
  • [44] Multi-Scale Attention for Audio Question Answering
    Li, Guangyao
    Xu, Yixin
    Hu, Di
    INTERSPEECH 2023, 2023, : 3442 - 3446
  • [45] Multi-scale attention guided pose transfer
    Roy, Prasun
    Bhattacharya, Saumik
    Ghosh, Subhankar
    Pal, Umapada
    PATTERN RECOGNITION, 2023, 137
  • [46] Attention-based Frequency-aware Multi-scale Network for Sequential Recommendation
    Zhang, Yichi
    Yin, Guisheng
    Dong, Hongbin
    Zhang, Liguo
    APPLIED SOFT COMPUTING, 2022, 127
  • [47] Kernel Attention Based Multi-scale Adaptive Graph Convolutional Neural Network for Skeleton-Based
    Liu, Yanan
    Zhang, Hao
    Xu, Dan
    2021 IEEE 7TH INTERNATIONAL CONFERENCE ON VIRTUAL REALITY (ICVR 2021), 2021, : 96 - 103
  • [48] Radar Signal Classification with Multi-Frequency Multi-Scale Deformable Convolutional Networks and Attention Mechanisms
    Liang, Ruofei
    Cen, Yigang
    REMOTE SENSING, 2024, 16 (08)
  • [49] MULTI-SCALE TEMPORAL FREQUENCY CONVOLUTIONAL NETWORK WITH AXIAL ATTENTION FOR MULTI-CHANNEL SPEECH ENHANCEMENT
    Zhang, Guochang
    Wang, Chunliang
    Yu, Libiao
    Wei, Jianqiang
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9206 - 9210
  • [50] Multi-scale structural kernel representation for object detection
    Wang, Hao
    Wang, Qilong
    Li, Peihua
    Zuo, Wangmeng
    PATTERN RECOGNITION, 2021, 110