Contextual Generation of Word Embeddings for Out of Vocabulary Words in Downstream Tasks

被引:5
|
作者
Garneau, Nicolas [1 ]
Leboeuf, Jean-Samuel [1 ]
Lamontagne, Luc [1 ]
机构
[1] Univ Laval, Dept Informat & Genie Logiciel, Quebec City, PQ, Canada
来源
关键词
Natural language processing; Sequence labeling; Out of vocabulary words; Contextual word embeddings;
D O I
10.1007/978-3-030-18305-9_60
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Over the past few years, the use of pre-trained word embeddings to solve natural language processing tasks has considerably improved performances on every end. However, even though these embeddings are trained on gigantic corpora, the vocabulary is fixed and thus numerous out of vocabulary words appear in specific downstream tasks. Recent studies proposed models able to generate embeddings for out of vocabulary words given its morphology and its context. These models assume that we have sufficient textual data in hand to train them. In contrast, we specifically tackle the case where such data is not available anymore and we rely only on pre-trained embeddings. As a solution, we introduce a model that predicts meaningful embeddings from the spelling of a word as well as from the context in which it appears for a downstream task without the need of pre-training on a given corpus. We thoroughly test our model on a joint tagging task on three different languages. Results show that our model helps consistently on all languages, outperforms other ways of handling out of vocabulary words and can be integrated into any neural model to predict out of vocabulary words.
引用
收藏
页码:563 / 569
页数:7
相关论文
共 50 条
  • [31] Unsupervised Word Polysemy Quantification with Multiresolution Grids of Contextual Embeddings
    Xypolopoulos, Christos
    Tixier, Antoine J-P
    Vazirgiannis, Michalis
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 3391 - 3401
  • [32] Contextual Word Embeddings and Topic Modeling in Healthy Dieting and Obesity
    Yeruva, Vijaya Kumari
    Junaid, Sidrah
    Lee, Yugyung
    JOURNAL OF HEALTHCARE INFORMATICS RESEARCH, 2019, 3 (02) : 159 - 183
  • [33] Detecting ongoing events using contextual word and sentence embeddings
    Maisonnave, Mariano
    Delbianco, Fernando
    Tohme, Fernando
    Maguitman, Ana
    Milios, Evangelos
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 209
  • [34] Metaphor Detection Using Contextual Word Embeddings From Transformers
    Liu, Jerry
    O'Hara, Nathan
    Rubin, Alexander
    Draelos, Rachel
    Rudin, Cynthia
    FIGURATIVE LANGUAGE PROCESSING, 2020, : 250 - 255
  • [35] Contextual Word Embeddings and Topic Modeling in Healthy Dieting and Obesity
    Vijaya Kumari Yeruva
    Sidrah Junaid
    Yugyung Lee
    Journal of Healthcare Informatics Research, 2019, 3 : 159 - 183
  • [36] Out-of-Vocabulary Word Detection and Beyond
    Kombrink, Stefan
    Hannemann, Mirko
    Burget, Lukas
    DETECTION AND IDENTIFICATION OF RARE AUDIOVISUAL CUES, 2012, 384 : 57 - 65
  • [37] Word Embedding Evaluation in Downstream Tasks and Semantic Analogies
    Santos, Joaquim
    Consoli, Bernardo
    Vieira, Renata
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4828 - 4834
  • [38] Explicit and Contextual Vocabulary Intervention: Effects on Word and Definition Learning
    Antia, Shirin D.
    Catalano, Jennifer A.
    Rivera, M. Christina
    Creamer, Catherine
    JOURNAL OF DEAF STUDIES AND DEAF EDUCATION, 2021, 26 (03): : 381 - 394
  • [39] More than Bags of Words: Sentiment Analysis with Word Embeddings
    Rudkowsky, Elena
    Haselmayer, Martin
    Wastian, Matthias
    Jenny, Marcelo
    Emrich, Stefan
    Sedlmair, Michael
    COMMUNICATION METHODS AND MEASURES, 2018, 12 (2-3) : 140 - 157
  • [40] Automated Template Generation based on Word Embeddings
    Manatuica, Maria
    Dascalu, Mihai
    Ruseti, Stefan
    Trausan-Matu, Stefan
    PROCEEDINGS OF THE 14TH INTERNATIONAL SCIENTIFIC CONFERENCE ELEARNING AND SOFTWARE FOR EDUCATION: ELEARNING CHALLENGES AND NEW HORIZONS, VOL 2, 2018, : 392 - 398