FREQUENCY AND MULTI-SCALE SELECTIVE KERNEL ATTENTION FOR SPEAKER VERIFICATION

被引:9
|
作者
Mun, Sung Hwan [1 ]
Jung, Jee-Weon [2 ]
Han, Min Hyun [1 ]
Kim, Nam Soo [1 ]
机构
[1] Seoul Natl Univ, Dept ECE & INMC, Seoul, South Korea
[2] Naver Corp, Seongnam Si, South Korea
关键词
speaker verification; selective kernel attention; multi-scale module;
D O I
10.1109/SLT54892.2023.10023305
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The majority of recent state-of-the-art speaker verification architectures adopt multi-scale processing and frequency-channel attention mechanisms. Convolutional layers of these models typically have a fixed kernel size, e.g., 3 or 5. In this study, we further contribute to this line of research utilising a selective kernel attention (SKA) mechanism. The SKA mechanism allows each convolutional layer to adaptively select the kernel size in a data-driven fashion. It is based on an attention mechanism which exploits both frequency and channel domain. We first apply existing SKA module to our baseline. Then we propose two SKA variants where the first variant is applied in front of the ECAPA-TDNN model and the other is combined with the Res2net backbone block. Through extensive experiments, we demonstrate that our two proposed SKA variants consistently improves the performance and are complementary when tested on three different evaluation protocols.
引用
收藏
页码:548 / 554
页数:7
相关论文
共 50 条
  • [1] MFA: TDNN WITH MULTI-SCALE FREQUENCY-CHANNEL ATTENTION FOR TEXT-INDEPENDENT SPEAKER VERIFICATION WITH SHORT UTTERANCES
    Liu, Tianchi
    Das, Rohan Kumar
    Lee, Kong Aik
    Li, Haizhou
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7517 - 7521
  • [2] HybridTrajNet: Enhancing Pedestrian Trajectory Prediction with Multi-Scale Attention and Large Selective Kernel
    Wang, Xiaoke
    Li, Wenzao
    Tang, Ran
    Zhang, Xiaoming
    2024 3RD INTERNATIONAL JOINT CONFERENCE ON INFORMATION AND COMMUNICATION ENGINEERING, JCICE 2024, 2024, : 129 - 133
  • [3] MULTI-SCALE SPEAKER EMBEDDING-BASED GRAPH ATTENTION NETWORKS FOR SPEAKER DIARISATION
    Kwon, Youngki
    Heo, Hee-Soo
    Jung, Jee-Weon
    Kim, You Jin
    Lee, Bong-Jin
    Chung, Joon Son
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8367 - 8371
  • [4] Speaker verification using attentive multi-scale convolutional recurrent network
    Li, Yanxiong
    Jiang, Zhongjie
    Cao, Wenchang
    Huang, Qisheng
    APPLIED SOFT COMPUTING, 2022, 126
  • [5] Rep-MCA-former: An efficient multi-scale convolution attention encoder for text-independent speaker verification
    Liu, Xiaohu
    Chen, Defu
    Wang, Xianbao
    Xiang, Sheng
    Zhou, Xuwen
    COMPUTER SPEECH AND LANGUAGE, 2024, 85
  • [6] Target Speaker Verification With Selective Auditory Attention for Single and Multi-Talker Speech
    Xu, Chenglin
    Rao, Wei
    Wu, Jibin
    Li, Haizhou
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2696 - 2709
  • [7] Using Segmentation With Multi-Scale Selective Kernel for Visual Object Tracking
    Bao, Feng
    Cao, Yifei
    Zhang, Shunli
    Lin, Beibei
    Zhao, Sicong
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 553 - 557
  • [8] TMS: Temporal multi-scale in time-delay neural network for speaker verification
    Zhang, Ruiteng
    Wei, Jianguo
    Lu, Xugang
    Lu, Wenhuan
    Jin, Di
    Zhang, Lin
    Xu, Junhai
    Dang, Jianwu
    APPLIED INTELLIGENCE, 2023, 53 (22) : 26497 - 26517
  • [9] TMS: Temporal multi-scale in time-delay neural network for speaker verification
    Ruiteng Zhang
    Jianguo Wei
    Xugang Lu
    Wenhuan Lu
    Di Jin
    Lin Zhang
    Junhai Xu
    Jianwu Dang
    Applied Intelligence, 2023, 53 : 26497 - 26517
  • [10] NEXT-TDNN: MODERNIZING MULTI-SCALE TEMPORAL CONVOLUTION BACKBONE FOR SPEAKER VERIFICATION
    He, Hyun-Jun
    Shin, Ui-Hyeop
    Lee, Ran
    Cheon, YoungJu
    Park, Hyung-Min
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2024), 2024, : 11186 - 11190