Introduction of Semantic Model to Help Speech Recognition

被引：1

作者：

Level, Stephane ^{[1
]}

Illina, Irina ^{[1
]}

Fohr, Dominique ^{[1
]}

机构：

[1] Univ Lorraine, INRIA, CNRS, F-54000 Nancy, France

来源：

TEXT, SPEECH, AND DIALOGUE (TSD 2020) | 2020年 / 12284卷

关键词：

Automatic Speech Recognition; Semantic context; Embeddings;

D O I：

10.1007/978-3-030-58323-1_41

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Current Automatic Speech Recognition (ASR) systems mainly take into account acoustic, lexical and local syntactic information. Long term semantic relations are not used. ASR systems significantly decrease performance when the training conditions and the testing conditions differ due to the noise, etc.. In this case the acoustic information can be less reliable. To help noisy ASR system, we propose to supplement ASR system with a semantic module. This module re-evaluates the N-best speech recognition hypothesis list and can be seen as a form of adaptation in the context of noise. For the words in the processed sentence that could have been poorly recognized, this module chooses words that correspond better to the semantic context of the sentence. To achieve this, we introduced the notions of a context part and possibility zones that measure the similarity between the semantic context of the document and the corresponding possible hypothesis. The proposed methodology uses two continuous representations of words: word2vec and FastText. We conduct experiments on the publicly available TED conferences dataset (TED-LIUM) mixed with real noise. The proposed method achieves a significant improvement of the word error rate (WER) over the ASR system without semantic information.

引用

页码：377 / 385

页数：9

共 50 条

[1] SEMANTIC CACHE MODEL DRIVEN SPEECH RECOGNITION
Lecouteux, Benjamin
Nocera, Pascal
Linares, Georges
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4386 - 4389
[2] Introduction of the speaking rate in the model of speech recognition
Yousfi, A
Meziane, A
INTERNATIONAL CONFERENCE ON PARALLEL COMPUTING IN ELECTRICAL ENGINEERING - PARELEC 2000, PROCEEDINGS, 2000, : 64 - 66
[3] Semantic Communications for Speech Recognition
Weng, Zhenzi
Qin, Zhijin
Li, Geoffrey Ye
2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2021,
[4] An introduction to speech recognition
DeBleecker, MR
GLOBAL VISION, 1996, : 251 - 254
[5] SYNTACTIC AND SEMANTIC POSTPROCESSING FOR SPEECH RECOGNITION
KRALLMANN, H
MARZI, R
DECISION SUPPORT SYSTEMS, 1991, 7 (03) : 253 - 261
[6] AN INTRODUCTION TO SPEECH AND SPEAKER RECOGNITION
PEACOCKE, RD
GRAF, DH
COMPUTER, 1990, 23 (08) : 26 - 33
[7] CAN SPEECH RECOGNITION MACHINES HELP DEAF
NEWELL, AF
TEACHER OF THE DEAF, 1974, 72 (428): : 367 - 374
[8] Joint Decoding for Speech Recognition and Semantic Tagging
Deoras, Anoop
Sarikaya, Ruhi
Tur, Gokhan
Hakkani-Tuer, Dilek
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1066 - 1069
[9] Assistive Robot for Speech Semantic Recognition System
Mohamad, Siti Nur Ateeqa
Isa, Khalid
PROCEEDINGS OF THE 2018 7TH INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION ENGINEERING (ICCCE), 2018, : 50 - 55
[10] Latent semantic language modeling for speech recognition
Bellegarda, JR
MATHEMATICAL FOUNDATIONS OF SPEECH AND LANGUAGE PROCESSING, 2004, 138 : 73 - 103

← 1 2 3 4 5 →