Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation

被引:0
|
作者
Zhu, Jiaxu [1 ,3 ,6 ]
Tong, Weinan [1 ]
Xu, Yaoxun [1 ]
Song, Changhe [1 ,2 ]
Wu, Zhiyong [1 ,2 ]
You, Zhao [3 ]
Su, Dan [3 ]
Yu, Dong [4 ]
Meng, Helen [5 ]
机构
[1] Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen, Peoples R China
[2] Peng Cheng Lab, Shenzhen, Peoples R China
[3] Tencent AI Lab, Shenzhen, Peoples R China
[4] Tencent AI Lab, Bellevue, WA USA
[5] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[6] Tencent Inc, Shenzhen, Peoples R China
来源
关键词
Speech Recognition; Text-Only; Continuous Integrate and Fire; Domain Adaption;
D O I
10.21437/Interspeech.2023-1378
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Mapping two modalities, speech and text, into a shared representation space, is a research topic of using text-only data to improve end-to-end automatic speech recognition (ASR) performance in new domains. However, the length of speech representation and text representation is inconsistent. Although the previous method up-samples the text representation to align with acoustic modality, it may not match the expected actual duration. In this paper, we proposed novel representations match strategy through down-sampling acoustic representation to align with text modality. By introducing a continuous integrate-and-fire (CIF) module generating acoustic representations consistent with token length, our ASR model can learn unified representations from both modalities better, allowing for domain adaptation using text-only data of the target domain. Experiment results of new domain data demonstrate the effectiveness of the proposed method.
引用
收藏
页码:1334 / 1338
页数:5
相关论文
共 50 条
  • [31] Deep End-to-End Representation Learning for Food Type Recognition from Speech
    Sertolli, Benjamin
    Cummins, Nicholas
    Sengur, Abdulkadir
    Schuller, Bjorn W.
    ICMI'18: PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2018, : 574 - 578
  • [32] End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation
    Chang, Xuankai
    Maekaku, Takashi
    Fujita, Yuya
    Watanabe, Shinji
    INTERSPEECH 2022, 2022, : 3819 - 3823
  • [33] Acoustic Data-Driven Subword Modeling for End-to-End Speech Recognition
    Zhou, Wei
    Zeineldeen, Mohammad
    Zheng, Zuoyun
    Schlueter, Ralf
    Ney, Hermann
    INTERSPEECH 2021, 2021, : 2886 - 2890
  • [34] Generic Indic Text-to-speech Synthesisers with Rapid Adaptation in an End-to-end Framework
    Prakash, Anusha
    Murthy, Hema A.
    INTERSPEECH 2020, 2020, : 2962 - 2966
  • [35] Optimization for Low-Resource Speaker Adaptation in End-to-End Text-to-Speech
    Hong, Changi
    Lee, Jung Hyuk
    Jeon, Moongu
    Kim, Hong Kook
    2024 IEEE 21ST CONSUMER COMMUNICATIONS & NETWORKING CONFERENCE, CCNC, 2024, : 1060 - 1061
  • [36] M-Adapter: Modality Adaptation for End-to-End Speech-to-Text Translation
    Zhao, Jinming
    Yang, Hao
    Shareghi, Ehsan
    Haffari, Gholamreza
    INTERSPEECH 2022, 2022, : 111 - 115
  • [37] Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-Speech
    Yoon, Hyungchan
    Um, Seyun
    Kim, Changhwan
    Kang, Hong-Goo
    INTERSPEECH 2023, 2023, : 3023 - 3027
  • [38] Efficient Adaptation of Spoken Language Understanding based on End-to-End Automatic Speech Recognition
    Kim, Eesung
    Jajodia, Aditya
    Tseng, Cindy
    Neelagiri, Divya
    Ki, Taeyeon
    Apsingekar, Vijendra Raj
    INTERSPEECH 2023, 2023, : 3959 - 3963
  • [39] Personality-aware Training based Speaker Adaptation for End-to-end Speech Recognition
    Gu, Yue
    Du, Zhihao
    Zhang, Shiliang
    Chen, Qian
    Han, Jiqing
    INTERSPEECH 2023, 2023, : 1249 - 1253
  • [40] End-to-End Automatic Speech Recognition with a Reconstruction Criterion Using Speech-to-Text and Text-to-Speech Encoder-Decoders
    Masumura, Ryo
    Sato, Hiroshi
    Tanaka, Tomohiro
    Moriya, Takafumi
    Ijima, Yusuke
    Oba, Takanobu
    INTERSPEECH 2019, 2019, : 1606 - 1610