Selective Biasing with Trie-based Contextual Adapters for Personalised Speech Recognition using Neural Transducers

被引：1

作者：

Harding, Philip ^{[1
]}

Tong, Sibo ^{[1
]}

Wiesler, Simon ^{[1
]}

机构：

[1] Amazon Alexa, Munich, Germany

来源：

INTERSPEECH 2023 | 2023年

关键词：

speech recognition; contextual biasing; personalisation;

D O I：

10.21437/Interspeech.2023-739

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Neural transducer ASR models achieve state of the art accuracy on many tasks, however rare word recognition poses a particular challenge as models often fail to recognise words that occur rarely, or not at all, in the training data. Methods of contextual biasing, where models are dynamically adapted to bias their outputs towards a given list of relevant words and phrases, have been shown to be effective at alleviating this issue. While such methods are effective at improving rare word recognition, over-biasing can lead to degradation on common words. In this work we propose several extensions to a recently proposed trie-based method of contextual biasing. We show how performance of the method can be improved in terms of rare word recognition, especially in the case of very large catalogues, by introducing a simple normalisation term, how the method can be trained as an adapter module, and how selective biasing can be applied to practically eliminate over-biasing on common words.

引用

页码：256 / 260

页数：5

共 50 条

[11] LAMASSU: A Streaming Language-Agnostic Multilingual Speech Recognition and Translation Model Using Neural Transducers
Wang, Peidong
Sun, Eric
Xue, Jian
Wu, Yu
Zhou, Long
Gaur, Yashesh
Liu, Shujie
Li, Jinyu
INTERSPEECH 2023, 2023, : 57 - 61
[12] CONVOLUTIONAL NEURAL NETWORKS-BASED CONTINUOUS SPEECH RECOGNITION USING RAW SPEECH SIGNAL
Palaz, Dimitri
Magimai-Doss, Mathew
Collobert, Ronan
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4295 - 4299
[13] ISOLATED WORD SPEECH RECOGNITION USING A NEURAL NETWORK BASED SOURCE MODEL
LEE, GE
TATTERSALL, GD
SMYTH, SG
BT TECHNOLOGY JOURNAL, 1992, 10 (03): : 38 - 47
[14] Using DTW neural–based MFCC warping to improve emotional speech recognition
Mansour Sheikhan
Davood Gharavian
Farhad Ashoftedel
Neural Computing and Applications, 2012, 21 : 1765 - 1773
[15] Segment-Based Speech Emotion Recognition Using Recurrent Neural Networks
Tzinis, Efthymios
Potamianos, Alexandros
2017 SEVENTH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2017, : 190 - 195
[16] Speech based emotion recognition by using a faster region-based convolutional neural network
Suneetha C.
Anitha R.
Multimedia Tools and Applications, 2025, 84 (8) : 5205 - 5237
[17] Phoneme-based Thai speech recognition using fuzzy system and neural network
Cheirsilp, R
Santiprabhob, P
IC-AI'2000: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 1-III, 2000, : 393 - 397
[18] Using DTW neural-based MFCC warping to improve emotional speech recognition
Sheikhan, Mansour
Gharavian, Davood
Ashoftedel, Farhad
NEURAL COMPUTING & APPLICATIONS, 2012, 21 (07): : 1765 - 1773
[19] Using genetic algorithm to improve the performance of speech recognition based on artificial neural network
Lan, Min-Lun
Pan, Shing-Tai
Lai, Chih-Chin
ICICIC 2006: First International Conference on Innovative Computing, Information and Control, Vol 2, Proceedings, 2006, : 527 - 530
[20] Neural Network based Regression for Robust Overlapping Speech Recognition using Microphone Arrays
Li, Weifeng
Dines, John
Magimai-Doss, Mathew
Bourlard, Herve
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2012 - 2015

← 1 2 3 4 5 →