An improved wav2vec 2.0 pre-training approach using enhanced local dependency modeling for speech recognition

被引:2
|
作者
Zhu, Qiu-shi [1 ]
Zhang, Jie [1 ]
Wu, Ming-hui [2 ]
Fang, Xin [1 ,2 ]
Dai, Li-Rong [1 ]
机构
[1] Univ Sci & Technol China USTC, NEL SLIP, Hefei, Peoples R China
[2] iFlytek Co Ltd, iFlytek Res, Hefei, Peoples R China
来源
INTERSPEECH 2021 | 2021年
基金
国家重点研发计划;
关键词
Speech recognition; pre-training; wav2vec; 2.0; transformer; low-resource; local and global dependence; TRANSFORMER;
D O I
10.21437/Interspeech.2021-67
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Wav2vec 2.0 is a recently proposed self-supervised pre-training framework for learning speech representation. It utilizes a transformer to learn global contextual representation, which is effective especially in low-resource scenarios. Besides, it was shown that combining convolution neural network and transformer to model both local and global dependencies is beneficial for e.g., automatic speech recognition (ASR), natural language processing (NLP). However, how to model the local and global dependence in pre-training models is still an open question in the speech domain. In this paper, we therefore propose a new transformer encoder for enhancing the local dependency by combining convolution and self-attention modules. The transformer encoder first parallels the convolution and self-attention modules, and then serialized with another convolution module, sandwiched by a pair of feed forward modules. Experimental results show that the pre-trained model using the proposed method can reduce the word error rate (WER) compared to the reproduced wav2vec 2.0 at the cost of slightly increasing the size of training parameters.
引用
收藏
页码:4334 / 4338
页数:5
相关论文
共 50 条
  • [21] Wav2f0: Exploring the Potential of Wav2vec 2.0 for Speech Fundamental Frequency Extraction
    Feng, Rui
    Liu, Yin-Long
    Ling, Zhen-Hua
    Yuan, Jia-Hong
    2024 IEEE 14TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, ISCSLP 2024, 2024, : 169 - 173
  • [22] Exploring the potential of Wav2vec 2.0 for speech emotion recognition using classifier combination and attention-based feature fusion
    Nasersharif, Babak
    Namvarpour, Mohammad
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (16): : 23667 - 23688
  • [23] Multi-level Fusion of Wav2vec 2.0 and BERT for Multimodal Emotion Recognition
    Zhao, Zihan
    Wang, Yanfeng
    Wang, Yu
    INTERSPEECH 2022, 2022, : 4725 - 4729
  • [24] Improving speech depression detection using transfer learning with wav2vec 2.0 in low-resource environments
    Zhang, Xu
    Zhang, Xiangcheng
    Chen, Weisi
    Li, Chenlong
    Yu, Chengyuan
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [25] BrainTalker: Low-Resource Brain-to-Speech Synthesis with Transfer Learning using Wav2Vec 2.0
    Kim, Miseul
    Piao, Zhenyu
    Lee, Jihyun
    Kang, Hong-Goo
    2023 IEEE EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS, BHI, 2023,
  • [26] PROFICIENCY ASSESSMENT OF L2 SPOKEN ENGLISH USING WAV2VEC 2.0
    Banno, Stefano
    Matassoni, Marco
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 1088 - 1095
  • [27] CTRL: Continual Representation Learning to Transfer Information of Pre-trained for WAV2VEC 2.0
    Lee, Jae-Hong
    Lee, Chae-Won
    Choi, Jin-Seong
    Chang, Joon-Hyuk
    Seong, Woo Kyeong
    Lee, Jeonghan
    INTERSPEECH 2022, 2022, : 3398 - 3402
  • [28] Applying the conformal prediction paradigm for the uncertainty quantification of an end-to-end automatic speech recognition model (wav2vec 2.0)
    Ernez, Fares
    Arnold, Alexandre
    Galametz, Audrey
    Kobus, Catherine
    Ould-Amer, Nawal
    CONFORMAL AND PROBABILISTIC PREDICTION WITH APPLICATIONS, VOL 204, 2023, 204 : 16 - 35
  • [29] Kazakh Speech Recognition: Wav2vec2.0 vs. Whisper
    Kozhirbayev, Zhanibek
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2023, 14 (06) : 1382 - 1389
  • [30] A Novel Multi-Feature Fusion Model Based on Pre-Trained Wav2vec 2.0 for Underwater Acoustic Target Recognition
    Pu, Zijun
    Zhang, Qunfei
    Xue, Yangtao
    Zhu, Peican
    Cui, Xiaodong
    REMOTE SENSING, 2024, 16 (13)