Selective Biasing with Trie-based Contextual Adapters for Personalised Speech Recognition using Neural Transducers

被引:1
|
作者
Harding, Philip [1 ]
Tong, Sibo [1 ]
Wiesler, Simon [1 ]
机构
[1] Amazon Alexa, Munich, Germany
来源
关键词
speech recognition; contextual biasing; personalisation;
D O I
10.21437/Interspeech.2023-739
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Neural transducer ASR models achieve state of the art accuracy on many tasks, however rare word recognition poses a particular challenge as models often fail to recognise words that occur rarely, or not at all, in the training data. Methods of contextual biasing, where models are dynamically adapted to bias their outputs towards a given list of relevant words and phrases, have been shown to be effective at alleviating this issue. While such methods are effective at improving rare word recognition, over-biasing can lead to degradation on common words. In this work we propose several extensions to a recently proposed trie-based method of contextual biasing. We show how performance of the method can be improved in terms of rare word recognition, especially in the case of very large catalogues, by introducing a simple normalisation term, how the method can be trained as an adapter module, and how selective biasing can be applied to practically eliminate over-biasing on common words.
引用
收藏
页码:256 / 260
页数:5
相关论文
共 50 条
  • [1] CONTEXTUAL ADAPTERS FOR PERSONALIZED SPEECH RECOGNITION IN NEURAL TRANSDUCERS
    Sathyendra, Kanthashree Mysore
    Muniyappa, Thejaswi
    Chang, Feng-Ju
    Liu, Jing
    Su, Jinru
    Strimel, Grant P.
    Mouchtaris, Athanasios
    Kunzmann, Siegfried
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8537 - 8541
  • [2] Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion
    Duc Le
    Jain, Mahaveer
    Keren, Gil
    Kim, Suyoun
    Shi, Yangyang
    Mahadeokar, Jay
    Chan, Julian
    Shangguan, Yuan
    Fuegen, Christian
    Kalinli, Ozlem
    Saraf, Yatharth
    Seltzer, Michael L.
    INTERSPEECH 2021, 2021, : 1772 - 1776
  • [3] Effective Training of Attention-based Contextual Biasing Adapters with Synthetic Audio for Personalised ASR
    Naowarat, Burin
    Harding, Philip
    D'Alterio, Pasquale
    Tong, Sibo
    Hasan, Bashar Awwad Shiekh
    INTERSPEECH 2023, 2023, : 1264 - 1268
  • [4] Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition
    Xu, Tianyi
    Yang, Zhanheng
    Huang, Kaixun
    Guo, Pengcheng
    Zhang, Ao
    Li, Biao
    Chen, Changru
    Li, Chao
    Xie, Lei
    INTERSPEECH 2023, 2023, : 1668 - 1672
  • [5] Enhancing trie-based syntactic pattern recognition using AI heuristic search strategies
    Badr, G
    Oommen, BJ
    PATTERN RECOGNITION AND DATA MINING, PT 1, PROCEEDINGS, 2005, 3686 : 1 - 17
  • [6] Speech recognition by integrating audio, visual and contextual features based on neural networks
    Kim, MW
    Ryu, JW
    Kim, EJ
    ADVANCES IN NATURAL COMPUTATION, PT 2, PROCEEDINGS, 2005, 3611 : 155 - 164
  • [7] A SPEECH RECOGNITION METHOD USING COMPETITIVE AND SELECTIVE LEARNING NEURAL NETWORKS
    徐雄
    胡光锐
    严永红
    JournalofShanghaiJiaotongUniversity, 2000, (02) : 10 - 13
  • [8] Contextual Speech Recognition in End-to-End Neural Network Systems using Beam Search
    Williams, Ian
    Kannan, Anjuli
    Aleksci, Petar
    Rybach, David
    Sainath, Tara N.
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2227 - 2231
  • [9] Neural Network Based Recognition of Speech Using MFCC Features
    Barua, Pialy
    Ahmad, Kanij
    Khan, Ainul Anam Shahjamal
    Sanaullah, Muhammad
    2014 INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV), 2014,
  • [10] Speech Recognition System Based On Phonemes Using Neural Networks
    Maheswari, N. Uma
    Kabilan, A. P.
    Venkatesh, R.
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2009, 9 (07): : 148 - 153