CONTEXT-AWARE TRANSFORMER TRANSDUCER FOR SPEECH RECOGNITION

被引:21
|
作者
Chang, Feng-Ju [1 ]
Liu, Jing [1 ]
Radfar, Martin [1 ]
Mouchtaris, Athanasios [1 ]
Omologo, Maurizio [1 ]
Rastrow, Ariya [1 ]
Kunzmann, Siegfried [1 ]
机构
[1] Amazon Alexa, San Francisco, CA 94112 USA
关键词
speech recognition; context-aware training; attention; transformer-transducers; BERT;
D O I
10.1109/ASRU51503.2021.9687895
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
End-to-end (E2E) automatic speech recognition (ASR) systems often have difficulty recognizing uncommon words, that appear infrequently in the training data. One promising method, to improve the recognition accuracy on such rare words, is to latch onto personalized/contextual information at inference. In this work, we present a novel context-aware transformer transducer (CATT) network that improves the state-of-the-art transformer-based ASR system by taking advantage of such contextual signals. Specifically, we propose a multi-head attention-based context-biasing network, which is jointly trained with the rest of the ASR sub-networks. We explore different techniques to encode contextual data and to create the final attention context vectors. We also leverage both BLSTM and pretrained BERT based models to encode contextual data and guide the network training Using an in-house far-field dataset, we show that CATT, using a BERT based context encoder, improves the word error rate of the baseline transformer transducer and outperforms an existing deep contextual model by 24.2% and 19.4% respectively.
引用
收藏
页码:503 / 510
页数:8
相关论文
共 50 条
  • [1] VISUAL FEATURES FOR CONTEXT-AWARE SPEECH RECOGNITION
    Gupta, Abhinav
    Miao, Yajie
    Neves, Leonardo
    Metze, Florian
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5020 - 5024
  • [2] CONTEXT-AWARE ATTENTION MECHANISM FOR SPEECH EMOTION RECOGNITION
    Ramet, Gaetan
    Garner, Philip N.
    Baeriswyl, Michael
    Lazaridis, Alexandros
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 126 - 131
  • [3] Context-aware RNNLM Rescoring for Conversational Speech Recognition
    Wei, Kun
    Guo, Pengcheng
    Lv, Hang
    Tu, Zhen
    Xie, Lei
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [4] Adjustable Context-Aware Transformer
    Koohfar, Sepideh
    Dietz, Laura
    ADVANCED ANALYTICS AND LEARNING ON TEMPORAL DATA, AALTD 2022, 2023, 13812 : 3 - 17
  • [5] Context-Aware Speech Recognition Using Prompts for Language Learners
    Cheng, Jian
    INTERSPEECH 2024, 2024, : 4009 - 4013
  • [6] Multi global context-aware transformer for ship name recognition in IoT
    Xian, Yunting
    Lu, Lu
    Qiu, Xuanrui
    Xian, Jing
    IET COMMUNICATIONS, 2025, 19 (01)
  • [7] Context-aware transformer for image captioning
    Yang, Xin
    Wang, Ying
    Chen, Haishun
    Li, Jie
    Huang, Tingting
    NEUROCOMPUTING, 2023, 549
  • [8] CONTEXT-AWARE NEURAL CONFIDENCE ESTIMATION FOR RARE WORD SPEECH RECOGNITION
    Qiu, David
    Munkhdalai, Tsendsuren
    He, Yanzhang
    Sim, Khe Chai
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 31 - 37
  • [9] Speech Emotion Recognition using Context-Aware Dilated Convolution Network
    Kakuba, Samuel
    Han, Dong Seog
    2022 27TH ASIA PACIFIC CONFERENCE ON COMMUNICATIONS (APCC 2022): CREATING INNOVATIVE COMMUNICATION TECHNOLOGIES FOR POST-PANDEMIC ERA, 2022, : 601 - 604
  • [10] A CONTEXT-AWARE SPEECH RECOGNITION AND UNDERSTANDING SYSTEM FOR AIR TRAFFIC CONTROL DOMAIN
    Oualil, Youssef
    Klakow, Dietrich
    Szaszak, Gyoergy
    Srinivasamurthy, Ajay
    Helmke, Hartmut
    Motlicek, Petr
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 404 - 408