Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource

被引:0
|
作者
Tulkens, Stephan [1 ]
Emmery, Chris [1 ]
Daelemans, Walter [1 ]
机构
[1] Univ Antwerp, CLiPS, Antwerp, Belgium
关键词
word embeddings; benchmarking; word2vec; SPPMI; language variation; dialect identification;
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
Word embeddings have recently seen a strong increase in interest as a result of strong performance gains on a variety of tasks. However, most of this research also underlined the importance of benchmark datasets, and the difficulty of constructing these for a variety of language-specific tasks. Still, many of the datasets used in these tasks could prove to be fruitful linguistic resources, allowing for unique observations into language use and variability. In this paper we demonstrate the performance of multiple types of embeddings, created with both count and prediction-based architectures on a variety of corpora, in two language-specific tasks: relation evaluation, and dialect identification. For the latter, we compare unsupervised methods with a traditional, hand-crafted dictionary. With this research, we provide the embeddings themselves, the relation evaluation task benchmark for use in further research, and demonstrate how the benchmarked embeddings prove a useful unsupervised linguistic resource, effectively used in a downstream task.
引用
收藏
页码:4130 / 4136
页数:7
相关论文
共 50 条
  • [1] Unsupervised Multilingual Word Embeddings
    Chen, Xilun
    Cardie, Claire
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 261 - 270
  • [2] Linguistic Information in Word Embeddings
    Basirat, Ali
    Tang, Marc
    AGENTS AND ARTIFICIAL INTELLIGENCE, ICAART 2018, 2019, 11352 : 492 - 513
  • [3] Unsupervised Alignment of Distributional Word Embeddings
    Diallo, Aissatou
    Fuernkranz, Johannes
    ADVANCES IN ARTIFICIAL INTELLIGENCE, KI 2022, 2022, 13404 : 60 - 74
  • [4] Unsupervised Word Sense Disambiguation Using Word Embeddings
    Moradi, Behzad
    Ansari, Ebrahim
    Zabokrtsky, Zdenek
    PROCEEDINGS OF THE 2019 25TH CONFERENCE OF OPEN INNOVATIONS ASSOCIATION (FRUCT), 2019, : 228 - 233
  • [5] Word Embeddings for Unsupervised Named Entity Linking
    Nozza, Debora
    Sas, Cezar
    Fersini, Elisabetta
    Messina, Enza
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2019, PT II, 2019, 11776 : 115 - 132
  • [6] Incorporating word embeddings in unsupervised morphological segmentation
    Ustun, Ahmet
    Can, Burcu
    NATURAL LANGUAGE ENGINEERING, 2021, 27 (05) : 609 - 629
  • [7] Unsupervised Joint Training of Bilingual Word Embeddings
    Marie, Benjamin
    Fujita, Atsushi
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3224 - 3230
  • [8] Evaluating Word Embeddings with Categorical Modularity
    Casacuberta, Silvia
    Halevy, Karina
    Blasi, Damian E.
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 1982 - 1993
  • [9] Unsupervised Word Segmentation and Lexicon Discovery Using Acoustic Word Embeddings
    Kamper, Herman
    Jansen, Aren
    Goldwater, Sharon
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (04) : 669 - 679
  • [10] Unsupervised Morphological Segmentation Using Neural Word Embeddings
    Ustun, Ahmet
    Can, Burcu
    STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2016, 2016, 9918 : 43 - 53