Distributed Sentiment Analysis for Geo-Tagged Twitter Data

被引:0
|
作者
Zengin, Muhammed Said [1 ]
Arslan, Rabia [1 ]
Akgun, Mehmet Burak [1 ]
机构
[1] TOBB Ekon & Teknol Univ, Bilgisayar Muhendisligi Bolumu, Ankara, Turkey
关键词
Big data; distributed data processing; sentiment analysis; BERT;
D O I
10.1109/SIU55565.2022.9864702
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The ever-increasing frequency of sharing on social media makes these platforms one of the primary sources of data for computational social science studies. Similarly, examining and analyzing large scale social media data-sets is crucial for governments as well as companies. However, as the amount of data increases, insights that need to be derived from the data using artificial intelligence based models becomes more and more demanding in terms of processing power. In fact, hardware requirements might dramatically increase if the insights are needed under real-time or near-real time constraints. In this study, we developed a distributed sentiment analysis model that utilizes a large social media data-set. 16 million tweets have been collected and grouped by the originating city. The sentiment analysis model was produced by fine-tuning the pre-trained BERT model. Distributed big data analytics engine, Apache Spark, is used to execute the trained model in a distributed fashion. For evaluation purposes, the prediction time on a single compute unit is compared with the distributed prediction time. Sentiment analysis model has been executed separately for each of the data-groups corresponding to 81 provinces. The data-set containing 16 million tweets used in this study, the Turkish sentiment analysis model produced, the distributed prediction code developed for Apache Spark and all the results of the study can be accessed from the address https://distributed-sentiment-analysis.github.io/.
引用
收藏
页数:4
相关论文
共 50 条
  • [1] Regional Level Influenza Study with Geo-Tagged Twitter Data
    Wang, Feng
    Wang, Haiyan
    Xu, Kuai
    Raymond, Ross
    Chon, Jaime
    Fuller, Shaun
    Debruyn, Anton
    JOURNAL OF MEDICAL SYSTEMS, 2016, 40 (08)
  • [2] Regional Level Influenza Study with Geo-Tagged Twitter Data
    Feng Wang
    Haiyan Wang
    Kuai Xu
    Ross Raymond
    Jaime Chon
    Shaun Fuller
    Anton Debruyn
    Journal of Medical Systems, 2016, 40
  • [3] Geo-tagged Twitter collection and visualization system
    Fujita, Hideyuki
    CARTOGRAPHY AND GEOGRAPHIC INFORMATION SCIENCE, 2013, 40 (03) : 183 - 191
  • [4] Sentiment Analysis by Fusing Text and Location Features of Geo-Tagged Tweets
    Lim, Wei Lun
    Ho, Chiung Ching
    Ting, Choo-Yee
    IEEE ACCESS, 2020, 8 : 181014 - 181027
  • [5] Deriving retail centre locations and catchments from geo-tagged Twitter data
    Lloyd, Alyson
    Cheshire, James
    COMPUTERS ENVIRONMENT AND URBAN SYSTEMS, 2017, 61 : 108 - 118
  • [6] Evaluating Geo-Tagged Twitter Data to Analyze Tourist Flows in Styria, Austria
    Scholz, Johannes
    Jeznik, Janja
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2020, 9 (11)
  • [7] Modeling Flu Trends with Real-Time Geo-tagged Twitter Data Streams
    Chon, Jaime
    Raymond, Ross
    Wang, Haiyan
    Wang, Feng
    WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS, 2015, 9204 : 60 - 69
  • [8] Using Geo-Tagged Sentiment to Better Understand Social Interactions
    Vivanco, Elizabeth
    Palanca, Javier
    del Val, Elena
    Rebollo, Miguel
    Botti, Vicent
    ADVANCES IN PRACTICAL APPLICATIONS OF CYBER-PHYSICAL MULTI-AGENT SYSTEMS: THE PAAMS COLLECTION, PAAMS 2017, 2017, 10349 : 369 - 372
  • [9] Analyzing Regional Food Trends with Geo-tagged Twitter Food Photos
    Okamoto, Kaimu
    Yanai, Keiji
    2019 INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING (CBMI), 2019,
  • [10] SAR: A Sentiment-Aspect-Region Model for User Preference Analysis in Geo-tagged Reviews
    Zhao, Kaiqi
    Cong, Gao
    Yuan, Quan
    Zhu, Kenny Q.
    2015 IEEE 31ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2015, : 675 - 686