Predicting the Valence Rating of Russian Words Using Various Pre-trained Word Embeddings

被引:0
|
作者
Bochkarev, Vladimir V. [1 ]
Savinkov, Andrey, V [1 ]
Shevlyakova, Anna, V [1 ]
机构
[1] Kazan Fed Univ, Kazan, Russia
来源
SPEECH AND COMPUTER, SPECOM 2024, PT II | 2025年 / 15300卷
基金
俄罗斯科学基金会;
关键词
Pre-trained Word Embeddings; Sentiment Analysis; Word Valence; Neural Network Predictors;
D O I
10.1007/978-3-031-78014-1_26
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this work, we conducted a comparative testing of 20 sets of pre-trained vectors to computationally estimate valence ratings of words in the Russian language. The word valence was estimated using neural network predictors. A vector representing a word was fed to the input of a multilayer feed-forward neural network that calculated the valence rating of this word. The currently largest Russian dictionary with valence ratings, KartaSlovSent, was used as a source of word valence ratings for training models. The highest accuracy of valence rating estimation was obtained using a set of fasttext vectors trained on the Common-Crawl corpus that includes 103 billion words. Spearman's correlation coefficient between human ratings and their machine ratings was 0.859. The high estimation accuracy and the large size of the dictionary allows one to use this set of vectors to extrapolate human valence ratings to the widest range of words in the Russian language. It is also worth mentioning 4 sets of vectors presented on the RusVectores project page and trained using the texts of the Araneum Russicum Maximum and Taiga corpora. Despite a significantly smaller size of the training corpus, using these sets of vectors allows obtaining only slightly lower accuracy. The lowest results were obtained for sets of vectors trained using corpora of news texts.
引用
收藏
页码:349 / 361
页数:13
相关论文
共 50 条
  • [1] The impact of using pre-trained word embeddings in Sinhala chatbots
    Gamage, Bimsara
    Pushpananda, Randil
    Weerasinghe, Ruvan
    2020 20TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER-2020), 2020, : 161 - 165
  • [2] Disambiguating Clinical Abbreviations using Pre-trained Word Embeddings
    Jaber, Areej
    Martinez, Paloma
    HEALTHINF: PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES - VOL. 5: HEALTHINF, 2021, : 501 - 508
  • [3] Automated Employee Objective Matching Using Pre-trained Word Embeddings
    Ghanem, Mohab
    Elnaggar, Ahmed
    Mckinnon, Adam
    Debes, Christian
    Boisard, Olivier
    Matthes, Florian
    2021 IEEE 25TH INTERNATIONAL ENTERPRISE DISTRIBUTED OBJECT COMPUTING CONFERENCE (EDOC 2021), 2021, : 51 - 60
  • [4] A document representation framework with interpretable features using pre-trained word embeddings
    Narendra Babu Unnam
    P. Krishna Reddy
    International Journal of Data Science and Analytics, 2020, 10 : 49 - 64
  • [5] A document representation framework with interpretable features using pre-trained word embeddings
    Unnam, Narendra Babu
    Reddy, P. Krishna
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2020, 10 (01) : 49 - 64
  • [6] Sentiment analysis based on improved pre-trained word embeddings
    Rezaeinia, Seyed Mahdi
    Rahmani, Rouhollah
    Ghodsi, Ali
    Veisi, Hadi
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 117 : 139 - 147
  • [7] Dictionary-based Debiasing of Pre-trained Word Embeddings
    Kaneko, Masahiro
    Bollegala, Danushka
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 212 - 223
  • [8] Embodying Pre-Trained Word Embeddings Through Robot Actions
    Toyoda, Minori
    Suzuki, Kanata
    Mori, Hiroki
    Hayashi, Yoshihiko
    Ogata, Tetsuya
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (02): : 4225 - 4232
  • [9] Gender-preserving Debiasing for Pre-trained Word Embeddings
    Kaneko, Masahiro
    Bollegala, Danushka
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1641 - 1650
  • [10] A Comparative Study of Pre-trained Word Embeddings for Arabic Sentiment Analysis
    Zouidine, Mohamed
    Khalil, Mohammed
    2022 IEEE 46TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2022), 2022, : 1243 - 1248