Rep-MCA-former: An efficient multi-scale convolution attention encoder for text-independent speaker verification

被引：3

作者：

Liu, Xiaohu ^{[1
]}

Chen, Defu ^{[1
]}

Wang, Xianbao ^{[1
]}

Xiang, Sheng ^{[1
]}

Zhou, Xuwen ^{[1
]}

机构：

[1] Zhejiang Univ Technol, Informat Engineer Coll, Hangzhou 310023, Zhejiang, Peoples R China

来源：

COMPUTER SPEECH AND LANGUAGE | 2024年 / 85卷

关键词：

Speaker verification; Transformer encoder; Multi-scale convolution; Re-parameterization;

D O I：

10.1016/j.csl.2023.101600

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In many speaker verification tasks, the quality of speaker embedding is an important factor in affecting speaker verification systems. Advanced speaker embedding extraction networks aim to capture richer speaker features through the multi-branch network architecture. Recently, speaker verification systems based on transformer encoders have received much attention, and many satisfactory results have been achieved because transformer encoders can efficiently extract the global features of the speaker (e.g., MFA-Conformer). However, the large number of model parameters and computational latency are common problems faced by the above approaches, which make them difficult to apply to resource-constrained edge terminals. To address this issue, this paper proposes an effective, lightweight transformer model (MCA-former) with multi-scale convolutional self-attention (MCA), which can perform multi-scale modeling and channel modeling in the temporal direction of the input with low computational cost. In addition, in the inference phase of the model, we further develop a systematic re-parameterization method to convert the multi-branch network structure into the single-path topology, effectively improving the inference speed. We investigate the performance of the MCA-former for speaker verification under the VoxCeleb1 test set. The results show that the MCA-based transformer model is more advantageous in terms of the number of parameters and inference efficiency. By applying the re-parameterization, the inference speed of the model is increased by about 30%, and the memory consumption is significantly improved.

引用

页数：13

共 30 条

[21] Segmentation of pancreatic tumors based on multi-scale convolution and channel attention mechanism in the encoder-decoder scheme
Du, Yue
Zuo, Xiaoying
Liu, Shidong
Cheng, Dai
Li, Jie
Sun, Mingzhu
Zhao, Xin
Ding, Hui
Hu, Yabin
MEDICAL PHYSICS, 2023, 50 (12) : 7764 - 7778
[22] Cross multi-scale locally encoded gradient patterns for off-line text-independent writer identification
Chahi, Abderrazak
El Merabet, Youssef
Ruichek, Yassine
Touahni, Raja
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2020, 89
[23] Self-Attentive Multi-Layer Aggregation with Feature Recalibration and Deep Length Normalization for Text-Independent Speaker Verification System
Seo, Soonshin
Kim, Ji-Hwan
ELECTRONICS, 2020, 9 (10) : 1 - 14
[24] Multi-Channel Multi-Scale Convolution Attention Variational Autoencoder (MCA-VAE): An Interpretable Anomaly Detection Algorithm Based on Variational Autoencoder
Liu, Jingwen
Huang, Yuchen
Wu, Dizhi
Yang, Yuchen
Chen, Yanru
Chen, Liangyin
Zhang, Yuanyuan
SENSORS, 2024, 24 (16)
[25] MSSD: multi-scale object detector based on spatial pyramid depthwise convolution and efficient channel attention mechanism
Yipeng Zhou
Huaming Qian
Peng Ding
Journal of Real-Time Image Processing, 2023, 20
[26] MSSD: multi-scale object detector based on spatial pyramid depthwise convolution and efficient channel attention mechanism
Zhou, Yipeng
Qian, Huaming
Ding, Peng
JOURNAL OF REAL-TIME IMAGE PROCESSING, 2023, 20 (05)
[27] Highly efficient encoder-decoder network based on multi-scale edge enhancement and dilated convolution for LDCT image denoising
Jia, Lina
He, Xu
Huang, Aimin
Jia, Beibei
Wang, Xinfeng
SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (8-9) : 6081 - 6091
[28] RE-YOLO: An apple picking detection algorithm fusing receptive-field attention convolution and efficient multi-scale attention
Sui, Jinxue
Liu, Li
Wang, Zuoxun
Yang, Li
PLOS ONE, 2025, 20 (03):
[29] An weak surface defect inspection approach using efficient multi-scale attention and space-to-depth convolution network
Fu, Guizhong
Chen, Jiaao
Qian, Shikang
Miao, Jing
Li, Jinbin
Jiang, Quansheng
Zhu, Qixin
Shen, Yehu
MEASUREMENT, 2025, 243
[30] Efficient Neural Network for Text Recognition in Natural Scenes Based on End-to-End Multi-Scale Attention Mechanism
Peng, Huiling
Yu, Jia
Nie, Yalin
ELECTRONICS, 2023, 12 (06)

← 1 2 3 →