Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource

被引：0

作者：

Tulkens, Stephan ^{[1
]}

Emmery, Chris ^{[1
]}

Daelemans, Walter ^{[1
]}

机构：

[1] Univ Antwerp, CLiPS, Antwerp, Belgium

来源：

LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2016年

关键词：

word embeddings; benchmarking; word2vec; SPPMI; language variation; dialect identification;

D O I：

暂无

中图分类号：

H [语言、文字];

学科分类号：

05 ;

摘要：

Word embeddings have recently seen a strong increase in interest as a result of strong performance gains on a variety of tasks. However, most of this research also underlined the importance of benchmark datasets, and the difficulty of constructing these for a variety of language-specific tasks. Still, many of the datasets used in these tasks could prove to be fruitful linguistic resources, allowing for unique observations into language use and variability. In this paper we demonstrate the performance of multiple types of embeddings, created with both count and prediction-based architectures on a variety of corpora, in two language-specific tasks: relation evaluation, and dialect identification. For the latter, we compare unsupervised methods with a traditional, hand-crafted dictionary. With this research, we provide the embeddings themselves, the relation evaluation task benchmark for use in further research, and demonstrate how the benchmarked embeddings prove a useful unsupervised linguistic resource, effectively used in a downstream task.

引用

页码：4130 / 4136

页数：7

共 50 条

[21] Unsupervised Pathology Report Classification Through Document and Word Embeddings
Cheng, Jerome
LABORATORY INVESTIGATION, 2020, 100 (SUPPL 1) : 1445 - 1446
[22] Automatic Malware Clustering using Word Embeddings and Unsupervised Learning
Leonardo Duarte-Garcia, Hugo
Cortez-Marquez, Alberto
Sanchez-Perez, Gabriel
Perez-Meana, Hector
Toscano-Medina, Karina
Hernandez-Suarez, Aldo
2019 7TH INTERNATIONAL WORKSHOP ON BIOMETRICS AND FORENSICS (IWBF), 2019,
[23] Unsupervised Pathology Report Classification Through Document and Word Embeddings
Cheng, Jerome
MODERN PATHOLOGY, 2020, 33 (SUPPL 2) : 1445 - 1446
[24] Towards Unsupervised Text Classification Leveraging Experts and Word Embeddings
Haj-Yahia, Zied
Sieg, Adrien
Deleris, Lea A.
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 371 - 379
[25] Unsupervised Learning of Fundamental Emotional States via Word Embeddings
Mazzoleni, Mirko
Maroni, Gabriele
Previdi, Fabio
2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 31 - 36
[26] Combining Transformer Embeddings with Linguistic Features for Complex Word Identification
Ortiz-Zambrano, Jenny A.
Espin-Riofrio, Cesar
Montejo-Raez, Arturo
ELECTRONICS, 2023, 12 (01)
[27] Evaluating the Underlying Gender Bias in Contextualized Word Embeddings
Basta, Christine
Costa-jussa, Marta R.
Casas, Noe
GENDER BIAS IN NATURAL LANGUAGE PROCESSING (GEBNLP 2019), 2019, : 33 - 39
[28] Towards a Gold Standard for Evaluating Danish Word Embeddings
Schneidermann, Nina Skovgaard
Hvingelby, Rasmus
Pedersen, Bolette Sandford
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4754 - 4763
[29] Unsupervised framework for evaluating and explaining structural node embeddings of graphs
Dehghan, Ashkan
Siuta, Kinga
Skorupka, Agata
Betlen, Andrei
Miller, David
Kaminski, Bogumil
Pralat, Pawel
JOURNAL OF COMPLEX NETWORKS, 2024, 12 (02)
[30] MoRTy: Unsupervised Learning of Task-specialized Word Embeddings by Autoencoding
Rethmeier, Nils
Plank, Barbara
4TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP (REPL4NLP-2019), 2019, : 49 - 54

← 1 2 3 4 5 →