Finding Recurrent Out-of-Vocabulary Words

被引:0
|
作者
Qin, Long [1 ]
Rudnicky, Alexander [1 ]
机构
[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
关键词
OOV word detection; distributed evidence; bottom-up clustering;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Out-of-vocabulary (OOV) words can appear more than once in a conversation or over a period of time. Such multiple instances of the same OOV word provide valuable information for estimating the pronunciation or the part-of:speech (POS) tag of the word. But in a conventional OOV word detection system, each OOV word is recognized and treated individually. We therefore investigated how to identify recurrent OOV words in speech recognition. Specifically, we propose to cluster multiple instances of the same OOV word using a bottom-up approach. Phonetic, acoustic and contextual features were collected to measure the distance between OOV candidates. The experimental results show that the bottom-up clustering approach is very effective at detecting the recurrence of OOV words. We also found that the phonetic feature is better than the acoustic and contextual features, and the best performance is achieved when combining all features.
引用
收藏
页码:2241 / 2245
页数:5
相关论文
共 50 条
  • [1] Lexicon Stratification for Translating Out-of-Vocabulary Words
    Tsvetkov, Yulia
    Dyer, Chris
    PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL) AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (IJCNLP), VOL 2, 2015, : 125 - 131
  • [2] RNN Language Model Estimation for Out-of-Vocabulary Words
    Illina, Irina
    Fohr, Dominique
    HUMAN LANGUAGE TECHNOLOGY. CHALLENGES FOR COMPUTER SCIENCE AND LINGUISTICS, LTC 2017, 2020, 12598 : 199 - 211
  • [3] WASSUP? LOL : Characterizing Out-of-Vocabulary Words in Twitter
    Maity, Suman Kalyan
    Chaudhary, Anshit
    Kumar, Shraman
    Mukherjee, Animesh
    Sarda, Chaitanya
    Patil, Abhijeet
    Mondal, Akash
    PROCEEDINGS OF THE 19TH ACM CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK AND SOCIAL COMPUTING COMPANION, 2016, : 341 - 344
  • [4] Handling Out-of-Vocabulary Words in Lexicons to Polarity Classification
    Nascimento, Gabriel
    Duarte, Fellipe
    Guedes, Gustavo Paiva
    PROCEEDINGS OF THE 17TH BRAZILIAN SYMPOSIUM ON HUMAN FACTORS IN COMPUTING SYSTEMS (IHC 2018), 2015,
  • [5] COPING WITH OUT-OF-VOCABULARY WORDS: OPEN VERSUS HUGE VOCABULARY ASR
    Gerosa, Matteo
    Federico, Marcello
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4313 - 4316
  • [6] Similarity Scoring for Recognizing Repeated Out-of-Vocabulary Words
    Hannemann, Mirko
    Kombrink, Stefan
    Karafiat, Martin
    Burget, Lukas
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 897 - 900
  • [7] A category based approach for recognition of out-of-vocabulary words
    Gallwitz, F
    Noth, E
    Niemann, H
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 228 - 231
  • [8] Detection of Out-of-Vocabulary Words in Posterior Based ASR
    Ketabdar, Hamed
    Hannemann, Mirko
    Hermansky, Hynek
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2772 - 2775
  • [9] Impact of Out-of-Vocabulary Words on the Twitter Experience of Blind Users
    Lee, Hae-Na
    Ashok, Vikas
    PROCEEDINGS OF THE 2022 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI' 22), 2022,
  • [10] Improving Abstractive Summarization by Training Masked Out-of-Vocabulary Words
    Lee, Tae-Seok
    Lee, Hyun-Young
    Kang, Seung-Shik
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2022, 18 (03): : 344 - 358