Sentiment Analysis for Multilingual Corpora

被引:0
|
作者
Galeshchuk, Svitlana [1 ]
Qiu, Ju [2 ]
Jourdan, Julien [1 ]
机构
[1] PSL Res Univ, Governance Analyt, Univ Paris Dauphine, Pl Marechal Lattre Tassigny, F-75016 Paris, France
[2] PSL Res Univ, Univ Paris Dauphine, Pl Marechal Lattre Tassigny, F-75016 Paris, France
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The paper presents a generic approach to the supervised sentiment analysis of social media content in foreign languages. The method proposes translating documents from the original language to English with Google's Neural Translation Model. The resulted texts are then converted to vectors by averaging the vectorial representation of words derived from a pretrained Word2Vec English model. Testing the approach with several machine learning methods on Polish, Slovenian and Croatian Twitter corpora returns up to 86 % of classification accuracy on the out-of-sample data.
引用
收藏
页码:120 / 125
页数:6
相关论文
共 50 条
  • [41] Five-Dimensional Sentiment Analysis of Corpora, Documents and Words
    Honkela, Timo
    Korhonen, Jaakko
    Lagus, Krista
    Saarinen, Esa
    ADVANCES IN SELF-ORGANIZING MAPS AND LEARNING VECTOR QUANTIZATION, 2014, 295 : 209 - 218
  • [42] MULTEXT: Multilingual text tools and corpora
    Armstrong, S
    LEXICON AND TEST: REUSABLE METHODS AND RESOURCES FOR THE LINGUISTIC DEVELOPMENT OF GERMAN, 1996, 73 : 107 - 119
  • [43] Building and Modelling Multilingual Subjective Corpora
    Saad, Motaz
    Langlois, David
    Smaili, Kamel
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 3086 - 3091
  • [44] Comparability of corpora and search multilingual terminology
    Morin, Emmanuel
    Daille, Beatrice
    TRAITEMENT AUTOMATIQUE DES LANGUES, 2006, 47 (01): : 113 - 136
  • [45] Pseudo-Aligned Multilingual Corpora
    Diaz, Fernando
    Metzler, Donald
    20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 2727 - 2732
  • [46] Enhanced Entity Annotations for Multilingual Corpora
    Strobl, Michael
    Trabelsi, Amine
    Zaiane, Osmar
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 3732 - 3740
  • [47] Wikipedia as Multilingual Source of Comparable Corpora
    Gamallo Otero, Pablo
    Gonzalez Lopez, Isaac
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 21 - 25
  • [48] Text mining applied to multilingual corpora
    Neri, F
    Raffaelli, R
    Knowledge Mining, 2005, 185 : 123 - 131
  • [49] A machine learning approach to sentiment analysis in multilingual Web texts
    Boiy, Erik
    Moens, Marie-Francine
    INFORMATION RETRIEVAL, 2009, 12 (05): : 526 - 558
  • [50] Development of a Multilingual Model for Machine Sentiment Analysis in the Serbian Language
    Draskovic, Drazen
    Zecevic, Darinka
    Nikolic, Bosko
    MATHEMATICS, 2022, 10 (18)