MSTD: Moroccan Sentiment Twitter Dataset

被引:0
|
作者
Mihi, Soukaina [1 ]
Ali, Brahim Ait Ben [1 ]
El Bazi, Ismail [2 ]
Arezki, Sara [1 ]
Laachfoubi, Nabil [1 ]
机构
[1] Univ Hassan First Settat Morocco, Settat, Morocco
[2] Univ Moulay Slimane Beni Mellal, Beni Mellal, Morocco
关键词
Sentiment analysis; Moroccan dialect; machine-learning; stemming; lemmatization; feature extraction;
D O I
10.14569/IJACSA.2020.0111045
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the proliferation of social media and Internet accessibility, a massive amount of data has been produced. In most cases, the textual data available through the web comes mainly from people expressing their views in informal words. The Arabic language is one of the hardest Semitic languages to deal with because of its complex morphology. In this paper, a new contribution to the Arabic resources is presented as a large Moroccan dataset retrieved from Twitter and carefully annotated by native speakers. For the best of our knowledge, this dataset is the largest Moroccan dataset for sentiment analysis. It is distinguished by its size, its quality given by the commitment of annotators, and its accessibility for the research community. Furthermore, the MSTD (Moroccan Sentiment Twitter Dataset) is benchmarked through experiments carried out for 4-way classification as well as polarity classification (positive, negative). Various machine-learning algorithms are combined to feature extraction techniques to reach optimal settings. This work also presents the effect of stemming and lemmatization on the improvement of the obtained accuracies.
引用
收藏
页码:363 / 372
页数:10
相关论文
共 50 条
  • [41] Sentiment Analysis and Summarization of Twitter Data
    Bahrainian, Seyed-Ali
    Dengel, Andreas
    2013 IEEE 16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE 2013), 2013, : 227 - 234
  • [42] Contextual semantics for sentiment analysis of Twitter
    Saif, Hassan
    He, Yulan
    Fernandez, Miriam
    Alani, Harith
    INFORMATION PROCESSING & MANAGEMENT, 2016, 52 (01) : 5 - 19
  • [43] Sentiment analysis of multimodal twitter data
    Akshi Kumar
    Geetanjali Garg
    Multimedia Tools and Applications, 2019, 78 : 24103 - 24119
  • [44] Exploring Sentiment Analysis on Twitter Data
    Venugopalan, Manju
    Gupta, Deepa
    2015 EIGHTH INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING (IC3), 2015, : 241 - 247
  • [45] Automatic Sentiment Analysis of Twitter Messages
    Lima, Ana C. E. S.
    de Castro, Leandro N.
    2012 FOURTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL ASPECTS OF SOCIAL NETWORKS (CASON), 2012, : 52 - 57
  • [46] Sentiment Analysis of Twitter in Tourism Destinations
    Perez Cabanero, Carmen
    Bigne, Enrique
    Ruiz, Carla
    Carlos Cuenca, Antonio
    3RD INTERNATIONAL CONFERENCE ON ADVANCED RESEARCH METHODS AND ANALYTICS (CARMA 2020), 2020, : 181 - 189
  • [47] Interpreting the Public Sentiment Variations on Twitter
    Tan, Shulong
    Li, Yang
    Sun, Huan
    Guan, Ziyu
    Yan, Xifeng
    Bu, Jiajun
    Chen, Chun
    He, Xiaofei
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (05) : 1158 - 1170
  • [48] Feature Expansion for Sentiment Analysis in Twitter
    Setiawan, Erwin B.
    Widyantoro, Dwi H.
    Surendro, Kridanto
    2018 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, COMPUTER SCIENCE AND INFORMATICS (EECSI 2018), 2018, : 509 - 513
  • [49] SASM: A Tool for Sentiment Analysis on Twitter
    Onifade, O. F. W.
    Malik, M. A.
    2015 2ND WORLD SYMPOSIUM ON WEB APPLICATIONS AND NETWORKING (WSWAN), 2015,
  • [50] Multidimensional sentiment analysis on twitter with semiotics
    Chauhan D.
    Sutaria K.
    International Journal of Information Technology, 2019, 11 (4) : 677 - 682