Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource

被引:0
|
作者
Tulkens, Stephan [1 ]
Emmery, Chris [1 ]
Daelemans, Walter [1 ]
机构
[1] Univ Antwerp, CLiPS, Antwerp, Belgium
关键词
word embeddings; benchmarking; word2vec; SPPMI; language variation; dialect identification;
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
Word embeddings have recently seen a strong increase in interest as a result of strong performance gains on a variety of tasks. However, most of this research also underlined the importance of benchmark datasets, and the difficulty of constructing these for a variety of language-specific tasks. Still, many of the datasets used in these tasks could prove to be fruitful linguistic resources, allowing for unique observations into language use and variability. In this paper we demonstrate the performance of multiple types of embeddings, created with both count and prediction-based architectures on a variety of corpora, in two language-specific tasks: relation evaluation, and dialect identification. For the latter, we compare unsupervised methods with a traditional, hand-crafted dictionary. With this research, we provide the embeddings themselves, the relation evaluation task benchmark for use in further research, and demonstrate how the benchmarked embeddings prove a useful unsupervised linguistic resource, effectively used in a downstream task.
引用
收藏
页码:4130 / 4136
页数:7
相关论文
共 50 条
  • [31] Improving Unsupervised Acoustic Word Embeddings using Speaker and Gender Information
    van Staden, Lisa
    Kamper, Herman
    2020 INTERNATIONAL SAUPEC/ROBMECH/PRASA CONFERENCE, 2020, : 533 - 538
  • [32] A Targeted Retraining Scheme of Unsupervised Word Embeddings for Specific Supervised Tasks
    Qin, Pengda
    Xu, Weiran
    Guo, Jun
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2017, PT II, 2017, 10235 : 3 - 14
  • [33] Unsupervised cross-lingual word embeddings learning with adversarial training
    Li, Yuling
    Zhang, Yuhong
    Li, Peipei
    Hu, Xuegang
    2019 10TH IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (ICBK 2019), 2019, : 150 - 156
  • [34] Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings
    Jawanpuria, Pratik
    Meghwanshi, Mayank
    Mishra, Bamdev
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3052 - 3058
  • [35] A Closer Look on Unsupervised Cross-lingual Word Embeddings Mapping
    Plucinski, Kamil
    Lango, Mateusz
    Zimniewicz, Michal
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 5555 - 5562
  • [36] Mathematical features of semantic projections and word embeddings for automatic linguistic analysis
    de Cordoba, Pedro Fernandez
    Perez, Carlos A. Reyes
    Perez, Enrique A. Sanchez
    AIMS MATHEMATICS, 2025, 10 (02): : 3961 - 3982
  • [37] Durational Information in Word-initial Lexical Embeddings in Spoken Dutch
    Scharenborg, Odette
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3125 - 3129
  • [38] A Transparent Framework for Evaluating Unintended Demographic Bias in Word Embeddings
    Sweeney, Chris
    Najafian, Maryam
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1662 - 1667
  • [39] Keyword Distance Ratio: Evaluating Keyword Assignment with Word Embeddings
    Sepulvado, Brandon
    18TH INTERNATIONAL CONFERENCE ON SCIENTOMETRICS & INFORMETRICS (ISSI2021), 2021, : 1539 - 1540
  • [40] Evaluating Quality of Word Embeddings with Sentiment Polarity Identification Task
    Indurthi, Vijayasaradhi
    Oota, Subba Reddy
    SEMANTIC WEB CHALLENGES, SEMWEBEVAL 2018, 2018, 927 : 232 - 237