Rep-MCA-former: An efficient multi-scale convolution attention encoder for text-independent speaker verification

被引:3
|
作者
Liu, Xiaohu [1 ]
Chen, Defu [1 ]
Wang, Xianbao [1 ]
Xiang, Sheng [1 ]
Zhou, Xuwen [1 ]
机构
[1] Zhejiang Univ Technol, Informat Engineer Coll, Hangzhou 310023, Zhejiang, Peoples R China
来源
COMPUTER SPEECH AND LANGUAGE | 2024年 / 85卷
关键词
Speaker verification; Transformer encoder; Multi-scale convolution; Re-parameterization;
D O I
10.1016/j.csl.2023.101600
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In many speaker verification tasks, the quality of speaker embedding is an important factor in affecting speaker verification systems. Advanced speaker embedding extraction networks aim to capture richer speaker features through the multi-branch network architecture. Recently, speaker verification systems based on transformer encoders have received much attention, and many satisfactory results have been achieved because transformer encoders can efficiently extract the global features of the speaker (e.g., MFA-Conformer). However, the large number of model parameters and computational latency are common problems faced by the above approaches, which make them difficult to apply to resource-constrained edge terminals. To address this issue, this paper proposes an effective, lightweight transformer model (MCA-former) with multi-scale convolutional self-attention (MCA), which can perform multi-scale modeling and channel modeling in the temporal direction of the input with low computational cost. In addition, in the inference phase of the model, we further develop a systematic re-parameterization method to convert the multi-branch network structure into the single-path topology, effectively improving the inference speed. We investigate the performance of the MCA-former for speaker verification under the VoxCeleb1 test set. The results show that the MCA-based transformer model is more advantageous in terms of the number of parameters and inference efficiency. By applying the re-parameterization, the inference speed of the model is increased by about 30%, and the memory consumption is significantly improved.
引用
收藏
页数:13
相关论文
共 30 条
  • [21] Segmentation of pancreatic tumors based on multi-scale convolution and channel attention mechanism in the encoder-decoder scheme
    Du, Yue
    Zuo, Xiaoying
    Liu, Shidong
    Cheng, Dai
    Li, Jie
    Sun, Mingzhu
    Zhao, Xin
    Ding, Hui
    Hu, Yabin
    MEDICAL PHYSICS, 2023, 50 (12) : 7764 - 7778
  • [22] Cross multi-scale locally encoded gradient patterns for off-line text-independent writer identification
    Chahi, Abderrazak
    El Merabet, Youssef
    Ruichek, Yassine
    Touahni, Raja
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2020, 89
  • [23] Self-Attentive Multi-Layer Aggregation with Feature Recalibration and Deep Length Normalization for Text-Independent Speaker Verification System
    Seo, Soonshin
    Kim, Ji-Hwan
    ELECTRONICS, 2020, 9 (10) : 1 - 14
  • [24] Multi-Channel Multi-Scale Convolution Attention Variational Autoencoder (MCA-VAE): An Interpretable Anomaly Detection Algorithm Based on Variational Autoencoder
    Liu, Jingwen
    Huang, Yuchen
    Wu, Dizhi
    Yang, Yuchen
    Chen, Yanru
    Chen, Liangyin
    Zhang, Yuanyuan
    SENSORS, 2024, 24 (16)
  • [25] MSSD: multi-scale object detector based on spatial pyramid depthwise convolution and efficient channel attention mechanism
    Yipeng Zhou
    Huaming Qian
    Peng Ding
    Journal of Real-Time Image Processing, 2023, 20
  • [26] MSSD: multi-scale object detector based on spatial pyramid depthwise convolution and efficient channel attention mechanism
    Zhou, Yipeng
    Qian, Huaming
    Ding, Peng
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2023, 20 (05)
  • [27] Highly efficient encoder-decoder network based on multi-scale edge enhancement and dilated convolution for LDCT image denoising
    Jia, Lina
    He, Xu
    Huang, Aimin
    Jia, Beibei
    Wang, Xinfeng
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (8-9) : 6081 - 6091
  • [28] RE-YOLO: An apple picking detection algorithm fusing receptive-field attention convolution and efficient multi-scale attention
    Sui, Jinxue
    Liu, Li
    Wang, Zuoxun
    Yang, Li
    PLOS ONE, 2025, 20 (03):
  • [29] An weak surface defect inspection approach using efficient multi-scale attention and space-to-depth convolution network
    Fu, Guizhong
    Chen, Jiaao
    Qian, Shikang
    Miao, Jing
    Li, Jinbin
    Jiang, Quansheng
    Zhu, Qixin
    Shen, Yehu
    MEASUREMENT, 2025, 243
  • [30] Efficient Neural Network for Text Recognition in Natural Scenes Based on End-to-End Multi-Scale Attention Mechanism
    Peng, Huiling
    Yu, Jia
    Nie, Yalin
    ELECTRONICS, 2023, 12 (06)