End-to-end Named Entity Recognition from English Speech

被引:18
|
作者
Yadav, Hemant [1 ]
Ghosh, Sreyan [1 ]
Yu, Yi [2 ]
Shah, Rajiv Ratn [1 ]
机构
[1] IIIT Delhi, MIDAS, Delhi, India
[2] Natl Inst Informat, Tokyo, Japan
来源
关键词
End-to-end ASR; named entity recognition; deep learning; out of vocabulary (OOV) words;
D O I
10.21437/Interspeech.2020-2482
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Named entity recognition (NER) from text has been a widely studied problem and usually extracts semantic information from text. Until now, NER from speech is mostly studied in a twostep pipeline process that includes first applying an automatic speech recognition (ASR) system on an audio sample and then passing the predicted transcript to a NER tagger. In such cases, the error does not propagate from one step to another as both the tasks are not optimized in an end-to-end (E2E) fashion. Recent studies confirm that integrated approaches (e.g., E2E ASR) outperform sequential ones (e.g., phoneme based ASR). In this paper, we introduce a first publicly available NER annotated dataset for English speech and present an E2E approach, which jointly optimizes the ASR and NER tagger components. Experimental results show that the proposed E2E approach outperforms the classical two-step approach. We also discuss how NER from speech can be used to handle out of vocabulary (OOV) words in an ASR system.
引用
收藏
页码:4268 / 4272
页数:5
相关论文
共 50 条
  • [41] INVESTIGATING END-TO-END SPEECH RECOGNITION FOR MANDARIN-ENGLISH CODE-SWITCHING
    Shan, Changhao
    Weng, Chao
    Wang, Guangsen
    Su, Dan
    Luo, Min
    Yu, Dong
    Xie, Lei
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6056 - 6060
  • [42] Insights on Neural Representations for End-to-End Speech Recognition
    Ollerenshaw, Anna
    Jalal, Asif
    Hain, Thomas
    INTERSPEECH 2021, 2021, : 4079 - 4083
  • [43] Phonetically Induced Subwords for End-to-End Speech Recognition
    Papadourakis, Vasileios
    Mueller, Markus
    Liu, Jing
    Mouchtaris, Athanasios
    Omologo, Maurizio
    INTERSPEECH 2021, 2021, : 1992 - 1996
  • [44] Adapting End-to-End Speech Recognition for Readable Subtitles
    Liu, Danni
    Niehues, Jan
    Spanakis, Gerasimos
    17TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION (IWSLT 2020), 2020, : 247 - 256
  • [45] Hybrid end-to-end model for Kazakh speech recognition
    Mamyrbayev O.Z.
    Oralbekova D.O.
    Alimhan K.
    Nuranbayeva B.M.
    International Journal of Speech Technology, 2023, 26 (02) : 261 - 270
  • [46] End-to-End Speech Emotion Recognition With Gender Information
    Sun, Ting-Wei
    IEEE ACCESS, 2020, 8 (08): : 152423 - 152438
  • [47] Residual Language Model for End-to-end Speech Recognition
    Tsunoo, Emiru
    Kashiwagi, Yosuke
    Narisetty, Chaitanya
    Watanabe, Shinji
    INTERSPEECH 2022, 2022, : 3899 - 3903
  • [48] DEEP CONTEXT: END-TO-END CONTEXTUAL SPEECH RECOGNITION
    Pundak, Golan
    Sainath, Tara N.
    Prabhavalkar, Rohit
    Kannan, Anjuli
    Zhao, Ding
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 418 - 425
  • [49] End-to-end Speech-to-Punctuated-Text Recognition
    Nozaki, Jumon
    Kawahara, Tatsuya
    Ishizuka, Kenkichi
    Hashimoto, Taiichi
    INTERSPEECH 2022, 2022, : 1811 - 1815
  • [50] FunASR: A Fundamental End-to-End Speech Recognition Toolkit
    Gao, Zhifu
    Li, Zerui
    Wang, Jiaming
    Luo, Haoneng
    Shi, Xian
    Chen, Mengzhe
    Li, Yabin
    Zuo, Lingyun
    Du, Zhihao
    Zhang, Shiliang
    INTERSPEECH 2023, 2023, : 1593 - 1597