Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource

被引:0
|
作者
Tulkens, Stephan [1 ]
Emmery, Chris [1 ]
Daelemans, Walter [1 ]
机构
[1] Univ Antwerp, CLiPS, Antwerp, Belgium
关键词
word embeddings; benchmarking; word2vec; SPPMI; language variation; dialect identification;
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
Word embeddings have recently seen a strong increase in interest as a result of strong performance gains on a variety of tasks. However, most of this research also underlined the importance of benchmark datasets, and the difficulty of constructing these for a variety of language-specific tasks. Still, many of the datasets used in these tasks could prove to be fruitful linguistic resources, allowing for unique observations into language use and variability. In this paper we demonstrate the performance of multiple types of embeddings, created with both count and prediction-based architectures on a variety of corpora, in two language-specific tasks: relation evaluation, and dialect identification. For the latter, we compare unsupervised methods with a traditional, hand-crafted dictionary. With this research, we provide the embeddings themselves, the relation evaluation task benchmark for use in further research, and demonstrate how the benchmarked embeddings prove a useful unsupervised linguistic resource, effectively used in a downstream task.
引用
收藏
页码:4130 / 4136
页数:7
相关论文
共 50 条
  • [21] Unsupervised Pathology Report Classification Through Document and Word Embeddings
    Cheng, Jerome
    LABORATORY INVESTIGATION, 2020, 100 (SUPPL 1) : 1445 - 1446
  • [22] Automatic Malware Clustering using Word Embeddings and Unsupervised Learning
    Leonardo Duarte-Garcia, Hugo
    Cortez-Marquez, Alberto
    Sanchez-Perez, Gabriel
    Perez-Meana, Hector
    Toscano-Medina, Karina
    Hernandez-Suarez, Aldo
    2019 7TH INTERNATIONAL WORKSHOP ON BIOMETRICS AND FORENSICS (IWBF), 2019,
  • [23] Unsupervised Pathology Report Classification Through Document and Word Embeddings
    Cheng, Jerome
    MODERN PATHOLOGY, 2020, 33 (SUPPL 2) : 1445 - 1446
  • [24] Towards Unsupervised Text Classification Leveraging Experts and Word Embeddings
    Haj-Yahia, Zied
    Sieg, Adrien
    Deleris, Lea A.
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 371 - 379
  • [25] Unsupervised Learning of Fundamental Emotional States via Word Embeddings
    Mazzoleni, Mirko
    Maroni, Gabriele
    Previdi, Fabio
    2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 31 - 36
  • [26] Combining Transformer Embeddings with Linguistic Features for Complex Word Identification
    Ortiz-Zambrano, Jenny A.
    Espin-Riofrio, Cesar
    Montejo-Raez, Arturo
    ELECTRONICS, 2023, 12 (01)
  • [27] Evaluating the Underlying Gender Bias in Contextualized Word Embeddings
    Basta, Christine
    Costa-jussa, Marta R.
    Casas, Noe
    GENDER BIAS IN NATURAL LANGUAGE PROCESSING (GEBNLP 2019), 2019, : 33 - 39
  • [28] Towards a Gold Standard for Evaluating Danish Word Embeddings
    Schneidermann, Nina Skovgaard
    Hvingelby, Rasmus
    Pedersen, Bolette Sandford
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4754 - 4763
  • [29] Unsupervised framework for evaluating and explaining structural node embeddings of graphs
    Dehghan, Ashkan
    Siuta, Kinga
    Skorupka, Agata
    Betlen, Andrei
    Miller, David
    Kaminski, Bogumil
    Pralat, Pawel
    JOURNAL OF COMPLEX NETWORKS, 2024, 12 (02)
  • [30] MoRTy: Unsupervised Learning of Task-specialized Word Embeddings by Autoencoding
    Rethmeier, Nils
    Plank, Barbara
    4TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP (REPL4NLP-2019), 2019, : 49 - 54