Selective Biasing with Trie-based Contextual Adapters for Personalised Speech Recognition using Neural Transducers

被引:1
|
作者
Harding, Philip [1 ]
Tong, Sibo [1 ]
Wiesler, Simon [1 ]
机构
[1] Amazon Alexa, Munich, Germany
来源
关键词
speech recognition; contextual biasing; personalisation;
D O I
10.21437/Interspeech.2023-739
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Neural transducer ASR models achieve state of the art accuracy on many tasks, however rare word recognition poses a particular challenge as models often fail to recognise words that occur rarely, or not at all, in the training data. Methods of contextual biasing, where models are dynamically adapted to bias their outputs towards a given list of relevant words and phrases, have been shown to be effective at alleviating this issue. While such methods are effective at improving rare word recognition, over-biasing can lead to degradation on common words. In this work we propose several extensions to a recently proposed trie-based method of contextual biasing. We show how performance of the method can be improved in terms of rare word recognition, especially in the case of very large catalogues, by introducing a simple normalisation term, how the method can be trained as an adapter module, and how selective biasing can be applied to practically eliminate over-biasing on common words.
引用
收藏
页码:256 / 260
页数:5
相关论文
共 50 条
  • [11] LAMASSU: A Streaming Language-Agnostic Multilingual Speech Recognition and Translation Model Using Neural Transducers
    Wang, Peidong
    Sun, Eric
    Xue, Jian
    Wu, Yu
    Zhou, Long
    Gaur, Yashesh
    Liu, Shujie
    Li, Jinyu
    INTERSPEECH 2023, 2023, : 57 - 61
  • [12] CONVOLUTIONAL NEURAL NETWORKS-BASED CONTINUOUS SPEECH RECOGNITION USING RAW SPEECH SIGNAL
    Palaz, Dimitri
    Magimai-Doss, Mathew
    Collobert, Ronan
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4295 - 4299
  • [13] ISOLATED WORD SPEECH RECOGNITION USING A NEURAL NETWORK BASED SOURCE MODEL
    LEE, GE
    TATTERSALL, GD
    SMYTH, SG
    BT TECHNOLOGY JOURNAL, 1992, 10 (03): : 38 - 47
  • [14] Using DTW neural–based MFCC warping to improve emotional speech recognition
    Mansour Sheikhan
    Davood Gharavian
    Farhad Ashoftedel
    Neural Computing and Applications, 2012, 21 : 1765 - 1773
  • [15] Segment-Based Speech Emotion Recognition Using Recurrent Neural Networks
    Tzinis, Efthymios
    Potamianos, Alexandros
    2017 SEVENTH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2017, : 190 - 195
  • [16] Speech based emotion recognition by using a faster region-based convolutional neural network
    Suneetha C.
    Anitha R.
    Multimedia Tools and Applications, 2025, 84 (8) : 5205 - 5237
  • [17] Phoneme-based Thai speech recognition using fuzzy system and neural network
    Cheirsilp, R
    Santiprabhob, P
    IC-AI'2000: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 1-III, 2000, : 393 - 397
  • [18] Using DTW neural-based MFCC warping to improve emotional speech recognition
    Sheikhan, Mansour
    Gharavian, Davood
    Ashoftedel, Farhad
    NEURAL COMPUTING & APPLICATIONS, 2012, 21 (07): : 1765 - 1773
  • [19] Using genetic algorithm to improve the performance of speech recognition based on artificial neural network
    Lan, Min-Lun
    Pan, Shing-Tai
    Lai, Chih-Chin
    ICICIC 2006: First International Conference on Innovative Computing, Information and Control, Vol 2, Proceedings, 2006, : 527 - 530
  • [20] Neural Network based Regression for Robust Overlapping Speech Recognition using Microphone Arrays
    Li, Weifeng
    Dines, John
    Magimai-Doss, Mathew
    Bourlard, Herve
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2012 - 2015