MSTD: Moroccan Sentiment Twitter Dataset

被引:0
|
作者
Mihi, Soukaina [1 ]
Ali, Brahim Ait Ben [1 ]
El Bazi, Ismail [2 ]
Arezki, Sara [1 ]
Laachfoubi, Nabil [1 ]
机构
[1] Univ Hassan First Settat Morocco, Settat, Morocco
[2] Univ Moulay Slimane Beni Mellal, Beni Mellal, Morocco
关键词
Sentiment analysis; Moroccan dialect; machine-learning; stemming; lemmatization; feature extraction;
D O I
10.14569/IJACSA.2020.0111045
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the proliferation of social media and Internet accessibility, a massive amount of data has been produced. In most cases, the textual data available through the web comes mainly from people expressing their views in informal words. The Arabic language is one of the hardest Semitic languages to deal with because of its complex morphology. In this paper, a new contribution to the Arabic resources is presented as a large Moroccan dataset retrieved from Twitter and carefully annotated by native speakers. For the best of our knowledge, this dataset is the largest Moroccan dataset for sentiment analysis. It is distinguished by its size, its quality given by the commitment of annotators, and its accessibility for the research community. Furthermore, the MSTD (Moroccan Sentiment Twitter Dataset) is benchmarked through experiments carried out for 4-way classification as well as polarity classification (positive, negative). Various machine-learning algorithms are combined to feature extraction techniques to reach optimal settings. This work also presents the effect of stemming and lemmatization on the improvement of the obtained accuracies.
引用
收藏
页码:363 / 372
页数:10
相关论文
共 50 条
  • [1] Twitter Sentiment Geographical Index Dataset
    Yuchen Chai
    Devika Kakkar
    Juan Palacios
    Siqi Zheng
    Scientific Data, 10
  • [2] Twitter Sentiment Geographical Index Dataset
    Chai, Yuchen
    Kakkar, Devika
    Palacios, Juan
    Zheng, Siqi
    SCIENTIFIC DATA, 2023, 10 (01)
  • [3] Twitter Dataset and Evaluation of Transformers for Turkish Sentiment Analysis
    Koksal, Abdullatif
    Ozgur, Arzucan
    29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
  • [4] Exploration, Sentiment Analysis, Topic Modeling, and Visualization of Moroccan Twitter Data
    Habbat, Nassera
    Anoun, Houda
    Hassouni, Larbi
    ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 2, 2022, 1418 : 1067 - 1083
  • [5] An aspect-level sentiment analysis dataset for therapies on Twitter
    Guo, Yuting
    Das, Sudeshna
    Lakamana, Sahithi
    Sarker, Abeed
    DATA IN BRIEF, 2023, 50
  • [6] Topic Detection using BNgram Method and Sentiment Analysis on Twitter Dataset
    Tembhurnikar, Suvarna D.
    Patil, Nitin N.
    2015 4TH INTERNATIONAL CONFERENCE ON RELIABILITY, INFOCOM TECHNOLOGIES AND OPTIMIZATION (ICRITO) (TRENDS AND FUTURE DIRECTIONS), 2015,
  • [7] Application of Support Vector Machine (SVM) in the Sentiment Analysis of Twitter DataSet
    Han, Kai-Xu
    Chien, Wei
    Chiu, Chien-Ching
    Cheng, Yu-Ting
    APPLIED SCIENCES-BASEL, 2020, 10 (03):
  • [8] Enhanced Sentiment Analysis Algorithms for Multi-Weight Polarity Selection on Twitter Dataset
    Mostafa, Ayman Mohamed
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 35 (01): : 1015 - 1034
  • [9] Sentiment analysis dataset in Moroccan dialect: bridging the gap between Arabic and Latin scripted dialect
    Jbel, Mouad
    Jabrane, Mourad
    Hafidi, Imad
    Metrane, Abdulmutallib
    LANGUAGE RESOURCES AND EVALUATION, 2024,
  • [10] Sentiment analysis on twitter
    Department of Computer Engineering, Delhi Technological University Delhi, India
    Int. J. Comput. Sci. Issues, 2012, 4 4-3 (372-378):