MSTD: Moroccan Sentiment Twitter Dataset

被引:0
|
作者
Mihi, Soukaina [1 ]
Ali, Brahim Ait Ben [1 ]
El Bazi, Ismail [2 ]
Arezki, Sara [1 ]
Laachfoubi, Nabil [1 ]
机构
[1] Univ Hassan First Settat Morocco, Settat, Morocco
[2] Univ Moulay Slimane Beni Mellal, Beni Mellal, Morocco
关键词
Sentiment analysis; Moroccan dialect; machine-learning; stemming; lemmatization; feature extraction;
D O I
10.14569/IJACSA.2020.0111045
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the proliferation of social media and Internet accessibility, a massive amount of data has been produced. In most cases, the textual data available through the web comes mainly from people expressing their views in informal words. The Arabic language is one of the hardest Semitic languages to deal with because of its complex morphology. In this paper, a new contribution to the Arabic resources is presented as a large Moroccan dataset retrieved from Twitter and carefully annotated by native speakers. For the best of our knowledge, this dataset is the largest Moroccan dataset for sentiment analysis. It is distinguished by its size, its quality given by the commitment of annotators, and its accessibility for the research community. Furthermore, the MSTD (Moroccan Sentiment Twitter Dataset) is benchmarked through experiments carried out for 4-way classification as well as polarity classification (positive, negative). Various machine-learning algorithms are combined to feature extraction techniques to reach optimal settings. This work also presents the effect of stemming and lemmatization on the improvement of the obtained accuracies.
引用
收藏
页码:363 / 372
页数:10
相关论文
共 50 条
  • [21] Online Analysis of Sentiment on Twitter
    Minab, Shokoufeh Salem
    Jalali, Mehrdad
    Moattar, Mohammad Hossein
    SECOND INTERNATIONAL CONGRESS ON TECHNOLOGY, COMMUNICATION AND KNOWLEDGE (ICTCK 2015), 2015, : 359 - 365
  • [22] Exploring the sentiment of entrepreneurs on Twitter
    Waters, James
    Nicolaou, Nicos
    Stefanidis, Dimosthenis
    Efstathiades, Hariton
    Pallis, George
    Dikaiakos, Marios
    PLOS ONE, 2021, 16 (07):
  • [23] Explaining Sentiment Spikes in Twitter
    Giachanou, Anastasia
    Mele, Ida
    Crestani, Fabio
    CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 2263 - 2268
  • [24] Sentiment Analysis of Twitter Data
    Desai, Radhi D.
    PROCEEDINGS OF THE 2018 SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS), 2018, : 114 - 117
  • [25] A SURVEY OF TWITTER SENTIMENT ANALYSIS
    Anuprathibha, T.
    Selvib, C. S. Kanimozhi
    IIOAB JOURNAL, 2016, 7 (09) : 374 - 378
  • [26] Sentiment Analysis of Twitter Data
    Wang, Yili
    Guo, Jiaxuan
    Yuan, Chengsheng
    Li, Baozhu
    APPLIED SCIENCES-BASEL, 2022, 12 (22):
  • [27] Sentiment Analysis of Twitter Data
    El Rahman, Sahar A.
    AlOtaibi, Feddah Alhumaidi
    AlShehri, Wejdan Abdullah
    2019 INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCES (ICCIS), 2019, : 336 - 339
  • [28] The climate change Twitter dataset
    Effrosynidis, Dimitrios
    Karasakalidis, Alexandros, I
    Sylaios, Georgios
    Arampatzis, Avi
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 204
  • [29] Dataset on dynamics of Coronavirus on Twitter
    Aguilar-Gallegos, Norman
    Elizabeth Romero-Garcia, Leticia
    Genaro Martinez-Gonzalez, Enrique
    Ivan Garcia-Sanchez, Edgar
    Aguilar-Avila, Jorge
    DATA IN BRIEF, 2020, 30
  • [30] NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis
    Muhammad, Shamsuddeen Hassan
    Adelani, David Ifeoluwa
    Ruder, Sebastian
    Ahmad, Ibrahim Sa'id
    Abdulmumin, Idris
    Bello, Bello Shehu
    Choudhury, Monojit
    Emezue, Chris Chinenye
    Abdullahi, Saheed Salahudeen
    Aremu, Anuoluwapo
    Jorge, Alipio
    Brazdil, Pavel
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 590 - 602