Selective Biasing with Trie-based Contextual Adapters for Personalised Speech Recognition using Neural Transducers

被引：1

作者：

Harding, Philip ^{[1
]}

Tong, Sibo ^{[1
]}

Wiesler, Simon ^{[1
]}

机构：

[1] Amazon Alexa, Munich, Germany

来源：

INTERSPEECH 2023 | 2023年

关键词：

speech recognition; contextual biasing; personalisation;

D O I：

10.21437/Interspeech.2023-739

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Neural transducer ASR models achieve state of the art accuracy on many tasks, however rare word recognition poses a particular challenge as models often fail to recognise words that occur rarely, or not at all, in the training data. Methods of contextual biasing, where models are dynamically adapted to bias their outputs towards a given list of relevant words and phrases, have been shown to be effective at alleviating this issue. While such methods are effective at improving rare word recognition, over-biasing can lead to degradation on common words. In this work we propose several extensions to a recently proposed trie-based method of contextual biasing. We show how performance of the method can be improved in terms of rare word recognition, especially in the case of very large catalogues, by introducing a simple normalisation term, how the method can be trained as an adapter module, and how selective biasing can be applied to practically eliminate over-biasing on common words.

引用

页码：256 / 260

页数：5

共 50 条

[1] CONTEXTUAL ADAPTERS FOR PERSONALIZED SPEECH RECOGNITION IN NEURAL TRANSDUCERS
Sathyendra, Kanthashree Mysore
Muniyappa, Thejaswi
Chang, Feng-Ju
Liu, Jing
Su, Jinru
Strimel, Grant P.
Mouchtaris, Athanasios
Kunzmann, Siegfried
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8537 - 8541
[2] Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion
Duc Le
Jain, Mahaveer
Keren, Gil
Kim, Suyoun
Shi, Yangyang
Mahadeokar, Jay
Chan, Julian
Shangguan, Yuan
Fuegen, Christian
Kalinli, Ozlem
Saraf, Yatharth
Seltzer, Michael L.
INTERSPEECH 2021, 2021, : 1772 - 1776
[3] Effective Training of Attention-based Contextual Biasing Adapters with Synthetic Audio for Personalised ASR
Naowarat, Burin
Harding, Philip
D'Alterio, Pasquale
Tong, Sibo
Hasan, Bashar Awwad Shiekh
INTERSPEECH 2023, 2023, : 1264 - 1268
[4] Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition
Xu, Tianyi
Yang, Zhanheng
Huang, Kaixun
Guo, Pengcheng
Zhang, Ao
Li, Biao
Chen, Changru
Li, Chao
Xie, Lei
INTERSPEECH 2023, 2023, : 1668 - 1672
[5] Enhancing trie-based syntactic pattern recognition using AI heuristic search strategies
Badr, G
Oommen, BJ
PATTERN RECOGNITION AND DATA MINING, PT 1, PROCEEDINGS, 2005, 3686 : 1 - 17
[6] Speech recognition by integrating audio, visual and contextual features based on neural networks
Kim, MW
Ryu, JW
Kim, EJ
ADVANCES IN NATURAL COMPUTATION, PT 2, PROCEEDINGS, 2005, 3611 : 155 - 164
[7] A SPEECH RECOGNITION METHOD USING COMPETITIVE AND SELECTIVE LEARNING NEURAL NETWORKS
徐雄
胡光锐
严永红
JournalofShanghaiJiaotongUniversity, 2000, (02) : 10 - 13
[8] Contextual Speech Recognition in End-to-End Neural Network Systems using Beam Search
Williams, Ian
Kannan, Anjuli
Aleksci, Petar
Rybach, David
Sainath, Tara N.
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2227 - 2231
[9] Neural Network Based Recognition of Speech Using MFCC Features
Barua, Pialy
Ahmad, Kanij
Khan, Ainul Anam Shahjamal
Sanaullah, Muhammad
2014 INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV), 2014,
[10] Speech Recognition System Based On Phonemes Using Neural Networks
Maheswari, N. Uma
Kabilan, A. P.
Venkatesh, R.
INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2009, 9 (07): : 148 - 153

← 1 2 3 4 5 →