Distributed Sentiment Analysis for Geo-Tagged Twitter Data

被引:0
|
作者
Zengin, Muhammed Said [1 ]
Arslan, Rabia [1 ]
Akgun, Mehmet Burak [1 ]
机构
[1] TOBB Ekon & Teknol Univ, Bilgisayar Muhendisligi Bolumu, Ankara, Turkey
关键词
Big data; distributed data processing; sentiment analysis; BERT;
D O I
10.1109/SIU55565.2022.9864702
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The ever-increasing frequency of sharing on social media makes these platforms one of the primary sources of data for computational social science studies. Similarly, examining and analyzing large scale social media data-sets is crucial for governments as well as companies. However, as the amount of data increases, insights that need to be derived from the data using artificial intelligence based models becomes more and more demanding in terms of processing power. In fact, hardware requirements might dramatically increase if the insights are needed under real-time or near-real time constraints. In this study, we developed a distributed sentiment analysis model that utilizes a large social media data-set. 16 million tweets have been collected and grouped by the originating city. The sentiment analysis model was produced by fine-tuning the pre-trained BERT model. Distributed big data analytics engine, Apache Spark, is used to execute the trained model in a distributed fashion. For evaluation purposes, the prediction time on a single compute unit is compared with the distributed prediction time. Sentiment analysis model has been executed separately for each of the data-groups corresponding to 81 provinces. The data-set containing 16 million tweets used in this study, the Turkish sentiment analysis model produced, the distributed prediction code developed for Apache Spark and all the results of the study can be accessed from the address https://distributed-sentiment-analysis.github.io/.
引用
收藏
页数:4
相关论文
共 50 条
  • [31] A survey of geo-tagged multimedia content analysis within flickr
    Spyrou, Evaggelos
    Mylonas, Phivos
    IFIP Advances in Information and Communication Technology, 2014, 437 : 126 - 135
  • [32] Exploring the spatial distribution of geo-tagged Twitter feeds via street-centrality measures
    Aminreza Iranmanesh
    Resmiye Alpar Atun
    URBAN DESIGN International, 2018, 23 : 293 - 306
  • [33] Error Measures for Trajectory Estimations With Geo-Tagged Mobility Sample Data
    Parsafard, Mohsen
    Chi, Guangqing
    Qu, Xiaobo
    Li, Xiaopeng
    Wang, Haizhong
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2019, 20 (07) : 2566 - 2583
  • [34] Mining human mobility patterns from social geo-tagged data
    Comito, Carmela
    Falcone, Deborah
    Talia, Domenico
    PERVASIVE AND MOBILE COMPUTING, 2016, 33 : 91 - 107
  • [35] Exploring the spatial distribution of geo-tagged Twitter feeds via street-centrality measures
    Iranmanesh, Aminreza
    Atun, Resmiye Alpar
    URBAN DESIGN INTERNATIONAL, 2018, 23 (04) : 293 - 306
  • [36] Spatial Coverage Measurement of Geo-Tagged Visual Data: A Database Approach
    Alfarrarjeh, Abdullah
    Kim, Seon Ho
    Deshmukh, Akshay
    Rajan, Shivnesh
    Lu, Ying
    Shahabi, Cyrus
    2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2018,
  • [37] Annotating Points of Interest with Geo-tagged Tweets
    Zhao, Kaiqi
    Cong, Gao
    Sun, Aixin
    CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 417 - 426
  • [38] Travel topic analysis: a mutually reinforcing method for geo-tagged photos
    Kou, Ngai Meng
    Hou, Leong U.
    Yang, Yiyang
    Gong, Zhiguo
    GEOINFORMATICA, 2015, 19 (04) : 693 - 721
  • [39] Recognizing City Identity via Attribute Analysis of Geo-tagged Images
    Zhou, Bolei
    Liu, Liu
    Oliva, Aude
    Torralba, Antonio
    COMPUTER VISION - ECCV 2014, PT III, 2014, 8691 : 519 - 534
  • [40] Soft Integration of Geo-Tagged Data Sets in J-CO-QL+
    Fosci, Paolo
    Psaila, Giuseppe
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2022, 11 (09)