Contextual Generation of Word Embeddings for Out of Vocabulary Words in Downstream Tasks

被引：5

作者：

Garneau, Nicolas ^{[1
]}

Leboeuf, Jean-Samuel ^{[1
]}

Lamontagne, Luc ^{[1
]}

机构：

[1] Univ Laval, Dept Informat & Genie Logiciel, Quebec City, PQ, Canada

来源：

ADVANCES IN ARTIFICIAL INTELLIGENCE | 2019年 / 11489卷

关键词：

Natural language processing; Sequence labeling; Out of vocabulary words; Contextual word embeddings;

D O I：

10.1007/978-3-030-18305-9_60

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Over the past few years, the use of pre-trained word embeddings to solve natural language processing tasks has considerably improved performances on every end. However, even though these embeddings are trained on gigantic corpora, the vocabulary is fixed and thus numerous out of vocabulary words appear in specific downstream tasks. Recent studies proposed models able to generate embeddings for out of vocabulary words given its morphology and its context. These models assume that we have sufficient textual data in hand to train them. In contrast, we specifically tackle the case where such data is not available anymore and we rely only on pre-trained embeddings. As a solution, we introduce a model that predicts meaningful embeddings from the spelling of a word as well as from the context in which it appears for a downstream task without the need of pre-training on a given corpus. We thoroughly test our model on a joint tagging task on three different languages. Results show that our model helps consistently on all languages, outperforms other ways of handling out of vocabulary words and can be integrated into any neural model to predict out of vocabulary words.

引用

页码：563 / 569

页数：7

共 50 条

[41] New Word Analogy Corpus for Exploring Embeddings of Czech Words
Svoboda, Lukas
Brychcin, Tomas
COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, (CICLING 2016), PT I, 2018, 9623 : 103 - 114
[42] Improving Word Embeddings for Low Frequency Words by Pseudo Contexts
Li, Fang
Wang, Xiaojie
CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, CCL 2017, 2017, 10565 : 37 - 47
[43] COPING WITH OUT-OF-VOCABULARY WORDS: OPEN VERSUS HUGE VOCABULARY ASR
Gerosa, Matteo
Federico, Marcello
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4313 - 4316
[44] Lexicon Stratification for Translating Out-of-Vocabulary Words
Tsvetkov, Yulia
Dyer, Chris
PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL) AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (IJCNLP), VOL 2, 2015, : 125 - 131
[45] Early Detection of Severe Flu Outbreaks using Contextual Word Embeddings
Karsi, Redouane
Zaim, Mounia
El Alami, Jamila
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (02) : 212 - 219
[46] Contextual Word Embeddings Clustering Through Multiway Analysis: A Comparative Study
Ait-Saada, Mira
Nadif, Mohamed
ADVANCES IN INTELLIGENT DATA ANALYSIS XXI, IDA 2023, 2023, 13876 : 1 - 14
[47] Connected Words: Word Associations and Second Language Vocabulary Acquisition
Martinez, Ron
SYSTEM, 2011, 39 (01) : 121 - 122
[48] In defense of contextual vocabulary acquisition how to do things with words in context
Rapaport, WJ
MODELING AND USING CONTEXT, PROCEEDINGS, 2005, 3554 : 396 - 409
[49] CONNECTED WORDS: WORD ASSOCIATIONS AND SECOND LANGUAGE VOCABULARY ACQUISITION
Read, John
STUDIES IN SECOND LANGUAGE ACQUISITION, 2011, 33 (01) : 130 - 131
[50] Connected Words: Word Associations and Second Language Vocabulary Acquisition
Miralpeix, Imma
CANADIAN MODERN LANGUAGE REVIEW-REVUE CANADIENNE DES LANGUES VIVANTES, 2010, 66 (05): : 763 - 765

← 1 2 3 4 5 →