FREQUENCY AND MULTI-SCALE SELECTIVE KERNEL ATTENTION FOR SPEAKER VERIFICATION

被引:9
|
作者
Mun, Sung Hwan [1 ]
Jung, Jee-Weon [2 ]
Han, Min Hyun [1 ]
Kim, Nam Soo [1 ]
机构
[1] Seoul Natl Univ, Dept ECE & INMC, Seoul, South Korea
[2] Naver Corp, Seongnam Si, South Korea
来源
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT | 2022年
关键词
speaker verification; selective kernel attention; multi-scale module;
D O I
10.1109/SLT54892.2023.10023305
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The majority of recent state-of-the-art speaker verification architectures adopt multi-scale processing and frequency-channel attention mechanisms. Convolutional layers of these models typically have a fixed kernel size, e.g., 3 or 5. In this study, we further contribute to this line of research utilising a selective kernel attention (SKA) mechanism. The SKA mechanism allows each convolutional layer to adaptively select the kernel size in a data-driven fashion. It is based on an attention mechanism which exploits both frequency and channel domain. We first apply existing SKA module to our baseline. Then we propose two SKA variants where the first variant is applied in front of the ECAPA-TDNN model and the other is combined with the Res2net backbone block. Through extensive experiments, we demonstrate that our two proposed SKA variants consistently improves the performance and are complementary when tested on three different evaluation protocols.
引用
收藏
页码:548 / 554
页数:7
相关论文
共 50 条
  • [21] Multi-scale verification of distributed synchronisation
    Gainer, Paul
    Linker, Sven
    Dixon, Clare
    Hustadt, Ullrich
    Fisher, Michael
    FORMAL METHODS IN SYSTEM DESIGN, 2020, 55 (03) : 171 - 221
  • [22] Attention Fusion for Audio-Visual Person Verification Using Multi-Scale Features
    Hoermann, Stefan
    Moiz, Abdul
    Knoche, Martin
    Rigoll, Gerhard
    2020 15TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2020), 2020, : 281 - 285
  • [23] Gland Segmentation in Pancreas Histopathology Images Based on Selective Multi-scale Attention
    Yang, Changxing
    Xiang, Dehui
    Bian, Yun
    Lu, Jianping
    Jiang, Hui
    Zheng, Jianming
    MEDICAL IMAGING 2021: IMAGE PROCESSING, 2021, 11596
  • [24] Selective Deeply Supervised Multi-Scale Attention Network for Brain Tumor Segmentation
    Rehman, Azka
    Usman, Muhammad
    Shahid, Abdullah
    Latif, Siddique
    Qadir, Junaid
    SENSORS, 2023, 23 (04)
  • [25] MULTI-SCALE TEMPORAL FREQUENCY CONVOLUTIONAL NETWORK WITH AXIAL ATTENTION FOR SPEECH ENHANCEMENT
    Zhang, Guochang
    Yu, Libiao
    Wang, Chunliang
    Wei, Jianqiang
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9122 - 9126
  • [26] Multi-scale Wavelet Frequency Channel Attention for Remote Sensing Image Segmentation
    Su, Yu-Chen
    Liu, Tsung-Jung
    Liuy, Kuan-Hsien
    2022 IEEE 14TH IMAGE, VIDEO, AND MULTIDIMENSIONAL SIGNAL PROCESSING WORKSHOP (IVMSP), 2022,
  • [27] MFAGNet: multi-scale frequency attention gating network for land cover classification
    Liu, Jiancong
    Zhang, Dongmei
    He, Lihua
    Yu, Xingguo
    Han, Wei
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2023, 44 (21) : 6670 - 6697
  • [28] Multi-scale multi-modal fusion for object detection in autonomous driving based on selective kernel
    Gao, Xin
    Zhang, Guoying
    Xiong, Yijin
    MEASUREMENT, 2022, 194
  • [29] Multi-Scale Kernels for Short Utterance Speaker Recognition
    Zhang, Wei-Qiang
    Zhao, Junhong
    Zhang, Wen-Lin
    Liu, Jia
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 414 - +
  • [30] Multispectral Image Demosaicking Based on Multi-scale Dense Connections and Large-kernel Attention
    Yu, Shufang
    Song, Beibei
    Du, Wenwang
    Yuan, Jieran
    Sun, Wenfang
    2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 1007 - 1011