End-to-end Named Entity Recognition from English Speech

被引：18

作者：

Yadav, Hemant ^{[1
]}

Ghosh, Sreyan ^{[1
]}

Yu, Yi ^{[2
]}

Shah, Rajiv Ratn ^{[1
]}

机构：

[1] IIIT Delhi, MIDAS, Delhi, India

[2] Natl Inst Informat, Tokyo, Japan

来源：

INTERSPEECH 2020 | 2020年

关键词：

End-to-end ASR; named entity recognition; deep learning; out of vocabulary (OOV) words;

D O I：

10.21437/Interspeech.2020-2482

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Named entity recognition (NER) from text has been a widely studied problem and usually extracts semantic information from text. Until now, NER from speech is mostly studied in a twostep pipeline process that includes first applying an automatic speech recognition (ASR) system on an audio sample and then passing the predicted transcript to a NER tagger. In such cases, the error does not propagate from one step to another as both the tasks are not optimized in an end-to-end (E2E) fashion. Recent studies confirm that integrated approaches (e.g., E2E ASR) outperform sequential ones (e.g., phoneme based ASR). In this paper, we introduce a first publicly available NER annotated dataset for English speech and present an E2E approach, which jointly optimizes the ASR and NER tagger components. Experimental results show that the proposed E2E approach outperforms the classical two-step approach. We also discuss how NER from speech can be used to handle out of vocabulary (OOV) words in an ASR system.

引用

页码：4268 / 4272

页数：5

共 50 条

[31] End-to-End Speech Recognition For Arabic Dialects
Nasr, Seham
Duwairi, Rehab
Quwaider, Muhannad
ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2023, 48 (08) : 10617 - 10633
[32] End-to-End Speech Recognition and Disfluency Removal
Lou, Paria Jamshid
Johnson, Mark
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2051 - 2061
[33] Performance Monitoring for End-to-End Speech Recognition
Li, Ruizhi
Sell, Gregory
Hermansky, Hynek
INTERSPEECH 2019, 2019, : 2245 - 2249
[34] TOWARDS END-TO-END UNSUPERVISED SPEECH RECOGNITION
Liu, Alexander H.
Hsu, Wei-Ning
Auli, Michael
Baevski, Alexei
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 221 - 228
[35] TRIGGERED ATTENTION FOR END-TO-END SPEECH RECOGNITION
Moritz, Niko
Hori, Takaaki
Le Roux, Jonathan
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5666 - 5670
[36] An Overview of End-to-End Automatic Speech Recognition
Wang, Dong
Wang, Xiaodong
Lv, Shaohe
SYMMETRY-BASEL, 2019, 11 (08):
[37] End-to-End Speech Recognition in Agglutinative Languages
Mamyrbayev, Orken
Alimhan, Keylan
Zhumazhanov, Bagashar
Turdalykyzy, Tolganay
Gusmanova, Farida
INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2020), PT II, 2020, 12034 : 391 - 401
[38] End-to-end Korean Digits Speech Recognition
Roh, Jong-hyuk
Cho, Kwantae
Kim, Youngsam
Cho, Sangrae
2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 1137 - 1139
[39] SPEECH ENHANCEMENT USING END-TO-END SPEECH RECOGNITION OBJECTIVES
Subramanian, Aswin Shanmugam
Wang, Xiaofei
Baskar, Murali Karthick
Watanabe, Shinji
Taniguchi, Toru
Tran, Dung
Fujita, Yuya
2019 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2019, : 234 - 238
[40] On the End-to-End Solution to Mandarin-English Code-switching Speech Recognition
Zeng, Zhiping
Khassanov, Yerbolat
Van Tung Pham
Xu, Haihua
Chng, Eng Siong
Li, Haizhou
INTERSPEECH 2019, 2019, : 2165 - 2169

← 1 2 3 4 5 →