Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning

被引:2
|
作者
Chang, Xuankai [1 ]
Yan, Brian [1 ]
Fujita, Yuya [2 ]
Maekaku, Takashi [2 ]
Watanabe, Shinji [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Yahoo Japan Corp, Nagoya, Aichi, Japan
来源
基金
美国国家科学基金会;
关键词
self-supervised learning; discrete tokens; discretized input; speech recognition;
D O I
10.21437/Interspeech.2023-2051
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Self-supervised learning (SSL) of speech has shown impressive results in speech-related tasks, particularly in automatic speech recognition (ASR). While most methods employ the output of intermediate layers of the SSL model as real-valued features for downstream tasks, there is potential in exploring alternative approaches that use discretized token sequences. This approach offers benefits such as lower storage requirements and the ability to apply techniques from natural language processing. In this paper, we propose a new protocol that utilizes discretized token sequences in ASR tasks, which includes de-duplication and sub-word modeling to enhance the input sequence. It reduces computational cost by decreasing the length of the sequence. Our experiments on the LibriSpeech dataset demonstrate that our proposed protocol performs competitively with conventional ASR systems using continuous input features, while reducing computational and storage costs.
引用
收藏
页码:1399 / 1403
页数:5
相关论文
共 50 条
  • [21] Investigating Self-supervised Pre-training for End-to-end Speech Translation
    Ha Nguyen
    Bougares, Fethi
    Tomashenko, Natalia
    Esteve, Yannick
    Besacier, Laurent
    INTERSPEECH 2020, 2020, : 1466 - 1470
  • [22] PVStereo: Pyramid Voting Module for End-to-End Self-Supervised Stereo Matching
    Wang, Hengli
    Fan, Rui
    Cai, Peide
    Liu, Ming
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (03) : 4353 - 4360
  • [23] Biased Self-supervised learning for ASR
    Kreyssig, Florian L.
    Shi, Yangyang
    Guo, Jinxi
    Sari, Leda
    Mohamed, Abdelrahman
    Woodland, Philip C.
    INTERSPEECH 2023, 2023, : 4948 - 4952
  • [24] ShrinkML: End-to-End ASR Model Compression Using Reinforcement Learning
    Dudziak, Lukasz
    Abdelfattah, Mohamed S.
    Vipperla, Ravichander
    Laskaridis, Stefanos
    Lane, Nicholas D.
    INTERSPEECH 2019, 2019, : 2235 - 2239
  • [25] Learning end-to-end patient representations through self-supervised covariate balancing for causal treatment effect estimation
    Tesei, Gino
    Giampanis, Stefanos
    Shi, Jingpu
    Norgeot, Beau
    JOURNAL OF BIOMEDICAL INFORMATICS, 2023, 140
  • [26] FeaRLESS: Feature Refinement Loss for Ensembling Self-Supervised Learning Features in Robust End-to-end Speech Recognition
    Chen, Szu-Jui
    Xie, Jiamin
    Hansen, John H. L.
    INTERSPEECH 2022, 2022, : 3058 - 3062
  • [27] SAR: Self-Supervised Anti-Distortion Representation for End-To-End Speech Model
    Wang, Jianzong
    Zhang, Xulong
    Tang, Haobin
    Sun, Aolan
    Cheng, Ning
    Xiao, Jing
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [28] END-TO-END SPOKEN LANGUAGE UNDERSTANDING USING TRANSFORMER NETWORKS AND SELF-SUPERVISED PRE-TRAINED FEATURES
    Morais, Edmilson
    Kuo, Hong-Kwang J.
    Thomas, Samuel
    Tuske, Zoltan
    Kingsbury, Brian
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7483 - 7487
  • [29] Data-Driven End-to-End Optimization of Radio Over Fiber Transmission System Based on Self-Supervised Learning
    Zhu, Yue
    Ye, Jia
    Yan, Lianshan
    Zhou, Tao
    Yu, Xiao
    Zou, Xihua
    Pan, Wei
    JOURNAL OF LIGHTWAVE TECHNOLOGY, 2024, 42 (21) : 7532 - 7543
  • [30] End-to-End ASR with Adaptive Span Self-Attention
    Chang, Xuankai
    Subramanian, Aswin Shanmugam
    Guo, Pengcheng
    Watanabe, Shinji
    Fujita, Yuya
    Omachi, Motoi
    INTERSPEECH 2020, 2020, : 3595 - 3599