Contextual Generation of Word Embeddings for Out of Vocabulary Words in Downstream Tasks

被引:5
|
作者
Garneau, Nicolas [1 ]
Leboeuf, Jean-Samuel [1 ]
Lamontagne, Luc [1 ]
机构
[1] Univ Laval, Dept Informat & Genie Logiciel, Quebec City, PQ, Canada
来源
关键词
Natural language processing; Sequence labeling; Out of vocabulary words; Contextual word embeddings;
D O I
10.1007/978-3-030-18305-9_60
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Over the past few years, the use of pre-trained word embeddings to solve natural language processing tasks has considerably improved performances on every end. However, even though these embeddings are trained on gigantic corpora, the vocabulary is fixed and thus numerous out of vocabulary words appear in specific downstream tasks. Recent studies proposed models able to generate embeddings for out of vocabulary words given its morphology and its context. These models assume that we have sufficient textual data in hand to train them. In contrast, we specifically tackle the case where such data is not available anymore and we rely only on pre-trained embeddings. As a solution, we introduce a model that predicts meaningful embeddings from the spelling of a word as well as from the context in which it appears for a downstream task without the need of pre-training on a given corpus. We thoroughly test our model on a joint tagging task on three different languages. Results show that our model helps consistently on all languages, outperforms other ways of handling out of vocabulary words and can be integrated into any neural model to predict out of vocabulary words.
引用
收藏
页码:563 / 569
页数:7
相关论文
共 50 条
  • [21] Contextual and Non-Contextual Word Embeddings: an in-depth Linguistic Investigation
    Miaschi, Alessio
    Dell'Orletta, Felice
    5TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP (REPL4NLP-2020), 2020, : 110 - 119
  • [22] Contextual Word Representations: Putting Words into Computers
    Smith, Noah A.
    COMMUNICATIONS OF THE ACM, 2020, 63 (06) : 66 - 74
  • [23] Semantic Word Cloud Generation Based on Word Embeddings
    Xu, Jin
    Tao, Yubo
    Lin, Hai
    2016 IEEE PACIFIC VISUALIZATION SYMPOSIUM (PACIFICVIS), 2016, : 239 - 243
  • [24] Analysis of The Characteristics of Similar Words Computed by Word Embeddings
    Zhou, Shuhui
    Liu, Peihan
    Liu, Lizhen
    Song, Wei
    Cheng, Miaomiao
    PROCEEDINGS OF 2020 IEEE 10TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION (ICEIEC 2020), 2020, : 327 - 330
  • [25] Generating Bags of Words from the Sums of Their Word Embeddings
    White, Lyndon
    Togneri, Roberto
    Liu, Wei
    Bennamoun, Mohammed
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, (CICLING 2016), PT I, 2018, 9623 : 91 - 102
  • [26] Out-of-vocabulary word rejection algorithm in Korean variable vocabulary word recognition
    Moon, KS
    Kim, YJ
    Kim, HR
    Chung, JH
    ISCAS 2000: IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS - PROCEEDINGS, VOL V: EMERGING TECHNOLOGIES FOR THE 21ST CENTURY, 2000, : 53 - 56
  • [27] RESEARCH HIGHLIGHT GENERATION WITH ELMO CONTEXTUAL EMBEDDINGS
    Rehman, Tohida
    Sanyal, Debarshi Kumar
    Chattopadhyay, Samiran
    SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2023, 24 (02): : 181 - 190
  • [28] Word Embeddings for Fake Malware Generation
    Tran, Quang Duy
    Di Troia, Fabio
    SILICON VALLEY CYBERSECURITY CONFERENCE, SVCC 2022, 2022, 1683 : 22 - 37
  • [29] Enhancing Cross-Lingual Word Embeddings: Aligned Subword Vectors for Out-of-Vocabulary Terms in fastText
    Savelli, Claudio
    Giobergia, Flavio
    2024 IEEE 18TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES, AICT 2024, 2024,
  • [30] Finding Recurrent Out-of-Vocabulary Words
    Qin, Long
    Rudnicky, Alexander
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2241 - 2245