Twitter Dataset and Evaluation of Transformers for Turkish Sentiment Analysis

被引:12
|
作者
Koksal, Abdullatif [1 ]
Ozgur, Arzucan [1 ]
机构
[1] Bogazici Univ, Bilgisayar Muhendisligi Bolumu, Istanbul, Turkey
关键词
sentiment analysis; Turkish dataset; Twitter; BounTi; transformers; BERT;
D O I
10.1109/SIU53274.2021.9477814
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Sentiment analysis is one of the key topics in Natural Language Processing which helps several applications from social media analysis to stock market prediction. Sentiment analysis datasets are generally collected by semi-supervision through shopping or review websites. These datasets are constructed by mapping users' text reviews to the given scores by users. However, these datasets might contain errors due to automatic mapping, and generally they don't have the characteristic features of social media texts such as emojis, slangs, and typos. To address these problems, one of the first manually annotated Turkish Sentiment Analysis datasets from Twitter is proposed. The BounTi dataset contains Turkish tweets about specific universities at Turkey. Furthermore, the performance of multilingual and Turkish transformer models such as MBERT, XLM-Roberta, and BERTurk are analyzed for this dataset. The best proposed model is based on BERTurk and achieves 0.729 macro-averaged recall score on the test set. Finally, a social media analysis demonstration with the best model is performed on Turkish tweets about a food brand. BounTi dataset, finetuned models, and related scripts are publicly released.
引用
收藏
页数:4
相关论文
共 50 条
  • [1] Sentiment Analysis of Turkish Twitter Data
    Shehu, Harisu Abdullahi
    Tokat, Sezai
    Sharif, Md. Haidar
    Uyaver, Sahin
    THIRD INTERNATIONAL CONFERENCE OF MATHEMATICAL SCIENCES (ICMS 2019), 2019, 2183
  • [2] Sentiment Analysis for Turkish Twitter Feeds
    Coban, Onder
    Ozyer, Baris
    Ozyer, Gulsah Tumuklu
    2015 23RD SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2015, : 2388 - 2391
  • [3] A Hybrid Approach for the Sentiment Analysis of Turkish Twitter Data
    Shehu, H. A.
    Tokat, S.
    ARTIFICIAL INTELLIGENCE AND APPLIED MATHEMATICS IN ENGINEERING PROBLEMS, 2020, 43 : 182 - 190
  • [4] Twitter Sentiment Geographical Index Dataset
    Yuchen Chai
    Devika Kakkar
    Juan Palacios
    Siqi Zheng
    Scientific Data, 10
  • [5] Twitter Sentiment Geographical Index Dataset
    Chai, Yuchen
    Kakkar, Devika
    Palacios, Juan
    Zheng, Siqi
    SCIENTIFIC DATA, 2023, 10 (01)
  • [6] MSTD: Moroccan Sentiment Twitter Dataset
    Mihi, Soukaina
    Ali, Brahim Ait Ben
    El Bazi, Ismail
    Arezki, Sara
    Laachfoubi, Nabil
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (10) : 363 - 372
  • [7] An aspect-level sentiment analysis dataset for therapies on Twitter
    Guo, Yuting
    Das, Sudeshna
    Lakamana, Sahithi
    Sarker, Abeed
    DATA IN BRIEF, 2023, 50
  • [8] A Comparison of Similarity Metrics for Sentiment Analysis on Turkish Twitter Feeds
    Coban, Onder
    Ozyer, Baris
    Ozyer, Gulsah Tumuklu
    2015 IEEE INTERNATIONAL CONFERENCE ON SMART CITY/SOCIALCOM/SUSTAINCOM (SMARTCITY), 2015, : 333 - 338
  • [9] Preparation of Improved Turkish DataSet for Sentiment Analysis in Social Media
    Makinist, Semiha
    Hallac, Ibrahim Riza
    Karakus, Betul Ay
    Aydin, Galip
    2ND INTERNATIONAL CONFERENCE ON COMPUTATIONAL MATHEMATICS AND ENGINEERING SCIENCES (CMES2017), 2017, 13
  • [10] Topic Detection using BNgram Method and Sentiment Analysis on Twitter Dataset
    Tembhurnikar, Suvarna D.
    Patil, Nitin N.
    2015 4TH INTERNATIONAL CONFERENCE ON RELIABILITY, INFOCOM TECHNOLOGIES AND OPTIMIZATION (ICRITO) (TRENDS AND FUTURE DIRECTIONS), 2015,