Twitter Dataset and Evaluation of Transformers for Turkish Sentiment Analysis

被引:12
|
作者
Koksal, Abdullatif [1 ]
Ozgur, Arzucan [1 ]
机构
[1] Bogazici Univ, Bilgisayar Muhendisligi Bolumu, Istanbul, Turkey
关键词
sentiment analysis; Turkish dataset; Twitter; BounTi; transformers; BERT;
D O I
10.1109/SIU53274.2021.9477814
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Sentiment analysis is one of the key topics in Natural Language Processing which helps several applications from social media analysis to stock market prediction. Sentiment analysis datasets are generally collected by semi-supervision through shopping or review websites. These datasets are constructed by mapping users' text reviews to the given scores by users. However, these datasets might contain errors due to automatic mapping, and generally they don't have the characteristic features of social media texts such as emojis, slangs, and typos. To address these problems, one of the first manually annotated Turkish Sentiment Analysis datasets from Twitter is proposed. The BounTi dataset contains Turkish tweets about specific universities at Turkey. Furthermore, the performance of multilingual and Turkish transformer models such as MBERT, XLM-Roberta, and BERTurk are analyzed for this dataset. The best proposed model is based on BERTurk and achieves 0.729 macro-averaged recall score on the test set. Finally, a social media analysis demonstration with the best model is performed on Turkish tweets about a food brand. BounTi dataset, finetuned models, and related scripts are publicly released.
引用
收藏
页数:4
相关论文
共 50 条
  • [41] Sentiment analysis of multimodal twitter data
    Kumar, Akshi
    Garg, Geetanjali
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (17) : 24103 - 24119
  • [42] Sentiment analysis and Twitter: a game proposal
    Marco Furini
    Manuela Montangero
    Personal and Ubiquitous Computing, 2018, 22 : 771 - 785
  • [43] Analysis of Political Sentiment Orientations on Twitter
    Ansari, Mohd Zeeshan
    Aziz, M. B.
    Siddiqui, M. O.
    Mehra, H.
    Singh, K. P.
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA SCIENCE, 2020, 167 : 1821 - 1828
  • [44] Clustering and Sentiment Analysis on Twitter Data
    Ahuja, Shreya
    Dubey, Gaurav
    2017 2ND INTERNATIONAL CONFERENCE ON TELECOMMUNICATION AND NETWORKS (TEL-NET), 2017, : 420 - 424
  • [45] SENTIMENT ANALYSIS OF THE SYRIAN CONFLICT ON TWITTER
    Lucic, Danijela
    Katalinic, Josip
    Dokman, Tomislav
    MEDIJSKE STUDIJE-MEDIA STUDIES, 2020, 11 (22): : 46 - 61
  • [46] Sentiment Analysis on Algerian Dialect with Transformers
    Benmounah, Zakaria
    Boulesnane, Abdennour
    Fadheli, Abdeladim
    Khial, Mustapha
    APPLIED SCIENCES-BASEL, 2023, 13 (20):
  • [47] Sentiment Analysis and Summarization of Twitter Data
    Bahrainian, Seyed-Ali
    Dengel, Andreas
    2013 IEEE 16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE 2013), 2013, : 227 - 234
  • [48] Contextual semantics for sentiment analysis of Twitter
    Saif, Hassan
    He, Yulan
    Fernandez, Miriam
    Alani, Harith
    INFORMATION PROCESSING & MANAGEMENT, 2016, 52 (01) : 5 - 19
  • [49] Sentiment analysis of multimodal twitter data
    Akshi Kumar
    Geetanjali Garg
    Multimedia Tools and Applications, 2019, 78 : 24103 - 24119
  • [50] Exploring Sentiment Analysis on Twitter Data
    Venugopalan, Manju
    Gupta, Deepa
    2015 EIGHTH INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING (IC3), 2015, : 241 - 247