Selective Biasing with Trie-based Contextual Adapters for Personalised Speech Recognition using Neural Transducers

被引：1

作者：

Harding, Philip ^{[1
]}

Tong, Sibo ^{[1
]}

Wiesler, Simon ^{[1
]}

机构：

[1] Amazon Alexa, Munich, Germany

来源：

INTERSPEECH 2023 | 2023年

关键词：

speech recognition; contextual biasing; personalisation;

D O I：

10.21437/Interspeech.2023-739

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Neural transducer ASR models achieve state of the art accuracy on many tasks, however rare word recognition poses a particular challenge as models often fail to recognise words that occur rarely, or not at all, in the training data. Methods of contextual biasing, where models are dynamically adapted to bias their outputs towards a given list of relevant words and phrases, have been shown to be effective at alleviating this issue. While such methods are effective at improving rare word recognition, over-biasing can lead to degradation on common words. In this work we propose several extensions to a recently proposed trie-based method of contextual biasing. We show how performance of the method can be improved in terms of rare word recognition, especially in the case of very large catalogues, by introducing a simple normalisation term, how the method can be trained as an adapter module, and how selective biasing can be applied to practically eliminate over-biasing on common words.

引用

页码：256 / 260

页数：5

共 50 条

[41] Speech Emotion Recognition Using Multichannel Parallel Convolutional Recurrent Neural Networks based on Gammatone Auditory Filterbank
Peng, Zhichao
Zhu, Zhi
Unoki, Masashi
Dang, Jianwu
Akagi, Masato
2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1750 - 1755
[42] Vowel, digit and continuous speech recognition based on statistical, neural and hybrid modelling by using ASRS_RL
Dumitru, Corneliu Octavian
Gavat, Inge
EUROCON 2007: THE INTERNATIONAL CONFERENCE ON COMPUTER AS A TOOL, VOLS 1-6, 2007, : 670 - 677
[43] Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients
Pawar, Manju D.
Kokate, Rajendra D.
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (10) : 15563 - 15587
[44] Mask-based Beamforming Using Complex-valued Neural Network for Recognition of Spatial Target Speech
Hayakawa, Daichi
Kagoshima, Takehiko
Fujimura, Hiroshi
2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 23 - 29
[45] Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients
Manju D. Pawar
Rajendra D. Kokate
Multimedia Tools and Applications, 2021, 80 : 15563 - 15587
[46] End-To-End Speech Emotion Recognition Based on Time and Frequency Information Using Deep Neural Networks
Bakhshi, Ali
Wong, Aaron S. W.
Chalup, Stephan
ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 969 - 975
[47] Deep Neural Networks for Sensor-Based Human Activity Recognition Using Selective Kernel Convolution
Gao, Wenbin
Zhang, Lei
Huang, Wenbo
Min, Fuhong
He, Jun
Song, Aiguo
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2021, 70
[48] Optimal feature selection based speech emotion recognition using two-stream deep convolutional neural network
Mustaqeem
Kwon, Soonil
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2021, 36 (09) : 5116 - 5135
[49] Deep Neural Network for visual Emotion Recognition based on ResNet50 using Song-Speech characteristics
Ayadi, Souha
Lachiri, Zied
PROCEEDINGS OF THE 2022 5TH INTERNATIONAL CONFERENCE ON ADVANCED SYSTEMS AND EMERGENT TECHNOLOGIES IC_ASET'2022), 2022, : 363 - 368
[50] Exploring the performance of automatic speaker recognition using twin speech and deep learning-based artificial neural networks
Cavalcanti, Julio Cesar
da Silva, Ronaldo Rodrigues
Eriksson, Anders
Barbosa, Plinio A.
FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2024, 7

← 1 2 3 4 5 →