Contextual Generation of Word Embeddings for Out of Vocabulary Words in Downstream Tasks

被引:5
|
作者
Garneau, Nicolas [1 ]
Leboeuf, Jean-Samuel [1 ]
Lamontagne, Luc [1 ]
机构
[1] Univ Laval, Dept Informat & Genie Logiciel, Quebec City, PQ, Canada
来源
关键词
Natural language processing; Sequence labeling; Out of vocabulary words; Contextual word embeddings;
D O I
10.1007/978-3-030-18305-9_60
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Over the past few years, the use of pre-trained word embeddings to solve natural language processing tasks has considerably improved performances on every end. However, even though these embeddings are trained on gigantic corpora, the vocabulary is fixed and thus numerous out of vocabulary words appear in specific downstream tasks. Recent studies proposed models able to generate embeddings for out of vocabulary words given its morphology and its context. These models assume that we have sufficient textual data in hand to train them. In contrast, we specifically tackle the case where such data is not available anymore and we rely only on pre-trained embeddings. As a solution, we introduce a model that predicts meaningful embeddings from the spelling of a word as well as from the context in which it appears for a downstream task without the need of pre-training on a given corpus. We thoroughly test our model on a joint tagging task on three different languages. Results show that our model helps consistently on all languages, outperforms other ways of handling out of vocabulary words and can be integrated into any neural model to predict out of vocabulary words.
引用
收藏
页码:563 / 569
页数:7
相关论文
共 50 条
  • [1] Handling Out-Of-Vocabulary Problem in Hangeul Word Embeddings
    Kwon, Ohjoon
    Kim, Dohyun
    Lee, Soo-Ryeon
    Choi, Junyoung
    Lee, SangKeun
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 3213 - 3221
  • [2] Paraphrasing Out-of-Vocabulary Words with Word Embeddings and Semantic Lexicons for Low Resource Statistical Machine Translation
    Chu, Chenhui
    Kurohashi, Sadao
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 644 - 648
  • [3] Embedding for Out of Vocabulary Words Considering Contextual and Morphosyntactic Information
    Won, Min-Sub
    Lee, Jee-Hyong
    2018 INTERNATIONAL CONFERENCE ON FUZZY THEORY AND ITS APPLICATIONS (IFUZZY), 2018, : 212 - 215
  • [4] On the Downstream Performance of Compressed Word Embeddings
    May, Avner
    Zhang, Jian
    Dao, Tri
    Re, Christopher
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [5] FastContext: Handling Out-of-Vocabulary Words Using the Word Structure and Context
    Silva, Renato M.
    Lochter, Johannes, V
    Almeida, Tiago A.
    Yamakami, Akebo
    INTELLIGENT SYSTEMS, PT II, 2022, 13654 : 539 - 557
  • [6] SUB-WORD MODELING OF OUT OF VOCABULARY WORDS IN SPOKEN TERM DETECTION
    Szoke, Igor
    Burget, Lukas
    Cernocky, Jan
    Fapso, Michal
    2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 273 - 276
  • [7] Personalized Query Expansion with Contextual Word Embeddings
    Bassani, Elias
    Tonellotto, Nicola
    Pasi, Gabriella
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2024, 42 (02)
  • [8] Fusing contextual word embeddings for concreteness estimation
    Incitti, Francesca
    Snidaro, Lauro
    2021 IEEE 24TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2021, : 508 - 515
  • [9] Dissecting Contextual Word Embeddings: Architecture and Representation
    Peters, Matthew E.
    Neumann, Mark
    Zettlemoyer, Luke
    Yih, Wen-tau
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 1499 - 1509
  • [10] Chinese Word Segmentation and Out-Of-Vocabulary Words Detection Using Suffix Array
    Ji Wenyan
    Peng Tao
    Zuo Wanli
    He Fengling
    Zhu Huifeng
    WISM: 2009 INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND MINING, PROCEEDINGS, 2009, : 56 - 60