Twitter Dataset and Evaluation of Transformers for Turkish Sentiment Analysis

被引:12
|
作者
Koksal, Abdullatif [1 ]
Ozgur, Arzucan [1 ]
机构
[1] Bogazici Univ, Bilgisayar Muhendisligi Bolumu, Istanbul, Turkey
关键词
sentiment analysis; Turkish dataset; Twitter; BounTi; transformers; BERT;
D O I
10.1109/SIU53274.2021.9477814
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Sentiment analysis is one of the key topics in Natural Language Processing which helps several applications from social media analysis to stock market prediction. Sentiment analysis datasets are generally collected by semi-supervision through shopping or review websites. These datasets are constructed by mapping users' text reviews to the given scores by users. However, these datasets might contain errors due to automatic mapping, and generally they don't have the characteristic features of social media texts such as emojis, slangs, and typos. To address these problems, one of the first manually annotated Turkish Sentiment Analysis datasets from Twitter is proposed. The BounTi dataset contains Turkish tweets about specific universities at Turkey. Furthermore, the performance of multilingual and Turkish transformer models such as MBERT, XLM-Roberta, and BERTurk are analyzed for this dataset. The best proposed model is based on BERTurk and achieves 0.729 macro-averaged recall score on the test set. Finally, a social media analysis demonstration with the best model is performed on Turkish tweets about a food brand. BounTi dataset, finetuned models, and related scripts are publicly released.
引用
收藏
页数:4
相关论文
共 50 条
  • [31] A SURVEY OF TWITTER SENTIMENT ANALYSIS
    Anuprathibha, T.
    Selvib, C. S. Kanimozhi
    IIOAB JOURNAL, 2016, 7 (09) : 374 - 378
  • [32] Sentiment Analysis of Twitter Data
    Wang, Yili
    Guo, Jiaxuan
    Yuan, Chengsheng
    Li, Baozhu
    APPLIED SCIENCES-BASEL, 2022, 12 (22):
  • [33] Sentiment Analysis of Twitter Data
    El Rahman, Sahar A.
    AlOtaibi, Feddah Alhumaidi
    AlShehri, Wejdan Abdullah
    2019 INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCES (ICCIS), 2019, : 336 - 339
  • [34] A Mixed Malay-English Language COVID-19 Twitter Dataset: A Sentiment Analysis
    Kong, Jeffery T. H.
    Juwono, Filbert H. H.
    Ngu, Ik Ying
    Nugraha, I. Gde Dharma
    Maraden, Yan
    Wong, W. K.
    BIG DATA AND COGNITIVE COMPUTING, 2023, 7 (02)
  • [35] Sentiment Analysis of Turkish and English Twitter Feeds Using Word2Vec Model
    Karcioglu, Abdullah Ammar
    Aydin, Tolga
    2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
  • [36] Soaring Energy Prices: Understanding Public Engagement on Twitter Using Sentiment Analysis and Topic Modeling With Transformers
    Kastrati, Zenun
    Imran, Ali Shariq
    Daudpota, Sher Muhammad
    Memon, Muhammad Atif
    Kastrati, Muhamet
    IEEE ACCESS, 2023, 11 : 26541 - 26553
  • [37] The State-of-the-Art in Twitter Sentiment Analysis: A Review and Benchmark Evaluation
    Zimbra, David
    Abbasi, Ahmed
    Zeng, Daniel
    Chen, Hsinchun
    ACM TRANSACTIONS ON MANAGEMENT INFORMATION SYSTEMS, 2018, 9 (02)
  • [38] On the evaluation and combination of state-of-the-art features in Twitter sentiment analysis
    Jonnathan Carvalho
    Alexandre Plastino
    Artificial Intelligence Review, 2021, 54 : 1887 - 1936
  • [39] On the evaluation and combination of state-of-the-art features in Twitter sentiment analysis
    Carvalho, Jonnathan
    Plastino, Alexandre
    ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (03) : 1887 - 1936
  • [40] Sentiment Analysis using Optimised Feature Sets in Different Facebook/Twitter Dataset Domains with Big Data
    Al-Mashhadani M.I.
    Hussein K.M.
    Khudir E.T.
    Ilyas M.
    Iraqi Journal for Computer Science and Mathematics, 2022, 3 (01): : 64 - 70