Twitter Dataset and Evaluation of Transformers for Turkish Sentiment Analysis

被引:12
|
作者
Koksal, Abdullatif [1 ]
Ozgur, Arzucan [1 ]
机构
[1] Bogazici Univ, Bilgisayar Muhendisligi Bolumu, Istanbul, Turkey
关键词
sentiment analysis; Turkish dataset; Twitter; BounTi; transformers; BERT;
D O I
10.1109/SIU53274.2021.9477814
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Sentiment analysis is one of the key topics in Natural Language Processing which helps several applications from social media analysis to stock market prediction. Sentiment analysis datasets are generally collected by semi-supervision through shopping or review websites. These datasets are constructed by mapping users' text reviews to the given scores by users. However, these datasets might contain errors due to automatic mapping, and generally they don't have the characteristic features of social media texts such as emojis, slangs, and typos. To address these problems, one of the first manually annotated Turkish Sentiment Analysis datasets from Twitter is proposed. The BounTi dataset contains Turkish tweets about specific universities at Turkey. Furthermore, the performance of multilingual and Turkish transformer models such as MBERT, XLM-Roberta, and BERTurk are analyzed for this dataset. The best proposed model is based on BERTurk and achieves 0.729 macro-averaged recall score on the test set. Finally, a social media analysis demonstration with the best model is performed on Turkish tweets about a food brand. BounTi dataset, finetuned models, and related scripts are publicly released.
引用
收藏
页数:4
相关论文
共 50 条
  • [21] Sentiment analysis with Twitter
    Akgul, Eyup Sercan
    Ertano, Caner
    Diri, Banu
    PAMUKKALE UNIVERSITY JOURNAL OF ENGINEERING SCIENCES-PAMUKKALE UNIVERSITESI MUHENDISLIK BILIMLERI DERGISI, 2016, 22 (02): : 106 - 110
  • [22] Sentiment Classification for Turkish Twitter Feeds using LDA
    Coban, Onder
    Ozyer, Gulsah Tumuklu
    2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 129 - 132
  • [23] Towards Better Sentiment Analysis in the Turkish Language: Dataset Improvements and Model Innovations
    Zumberoglu, Kevser Busra
    Dik, Sumeyye Zulal
    Karadeniz, Busra Sinem
    Sahmoud, Shaaban
    APPLIED SCIENCES-BASEL, 2025, 15 (04):
  • [24] A Dataset and BERT-based Models for Targeted Sentiment Analysis on Turkish Texts
    Mutlu, M. Melih
    Ozgur, Arzucan
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): STUDENT RESEARCH WORKSHOP, 2022, : 467 - 472
  • [25] Enhanced Sentiment Analysis Algorithms for Multi-Weight Polarity Selection on Twitter Dataset
    Mostafa, Ayman Mohamed
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 35 (01): : 1015 - 1034
  • [26] Evolution and Evaluation: Sarcasm Analysis for Twitter Data Using Sentiment Analysis
    Bhakuni, Monika
    Kumar, Karan
    Iwendi, Celestine
    Singh, Avtar
    JOURNAL OF SENSORS, 2022, 2022
  • [27] A Comparative Evaluation of Word Embeddings Techniques for Twitter Sentiment Analysis
    Kaibi, Ibrahim
    Nfaoui, El Habib
    Satori, Hassan
    2019 INTERNATIONAL CONFERENCE ON WIRELESS TECHNOLOGIES, EMBEDDED AND INTELLIGENT SYSTEMS (WITS), 2019,
  • [28] Twitter Sentiment Analysis Using Machine Learning For Product Evaluation
    Yadav, Nikhil
    Kudale, Omkar
    Gupta, Srishti
    Rao, Aditi
    Shitole, Ajitkumar
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT-2020), 2020, : 181 - 185
  • [29] Online Analysis of Sentiment on Twitter
    Minab, Shokoufeh Salem
    Jalali, Mehrdad
    Moattar, Mohammad Hossein
    SECOND INTERNATIONAL CONGRESS ON TECHNOLOGY, COMMUNICATION AND KNOWLEDGE (ICTCK 2015), 2015, : 359 - 365
  • [30] Sentiment Analysis of Twitter Data
    Desai, Radhi D.
    PROCEEDINGS OF THE 2018 SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS), 2018, : 114 - 117