Leveraging GPT for the Generation of Multi-Platform Social Media Datasets for Research

被引:0
|
作者
Tari, Henry [1 ]
Khan, M. Danial [1 ]
Rutten, Justus [1 ]
Othman, Darian [2 ]
Kaushal, Rishabh [1 ,3 ,4 ]
Bertaglia, Thales [5 ]
Iamnitchi, Adriana [1 ,3 ]
机构
[1] Maastricht Univ, Dept Adv Comp Sci, Maastricht, Netherlands
[2] Maastricht Univ, Dept Data Analyt & Digitalisat, Maastricht, Netherlands
[3] Maastricht Univ, Inst Data Sci, Maastricht, Netherlands
[4] Indira Gandhi Delhi Tech Univ Women, Dept Informat Technol, Delhi, India
[5] Univ Utrecht, Utrecht, Netherlands
关键词
Social Media Research; LLMs; Synthetic Data;
D O I
10.1145/3648188.3675153
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Social media datasets are essential for research on disinformation, influence operations, social sensing, hate speech detection, cyber-bullying, and other significant topics. However, access to these datasets is often restricted due to costs and platform regulations. As such, acquiring datasets that span multiple platforms which are crucial for a comprehensive understanding of the digital ecosystem is particularly challenging. This paper explores the potential of large language models to create lexically and semantically relevant social media datasets across multiple platforms, aiming to match the quality of real datasets. We employ ChatGPT to generate synthetic data from a real dataset consisting of posts from three different social media platforms. We assess the lexical and semantic properties of the synthetic data and compare them with those of the real data. Our empirical findings suggest that using large language models to generate synthetic multi-platform social media data is promising. However, further enhancements are necessary to improve the fidelity of the outputs.
引用
收藏
页码:337 / 343
页数:7
相关论文
共 50 条
  • [21] Social Media Platforms and User Engagement: A Multi-Platform Study on One-way Firm Sustainability Communication
    Ashish Kumar Jha
    Nishant Kumar Verma
    Information Systems Frontiers, 2024, 26 : 177 - 194
  • [22] Quartet protein reference materials and datasets for multi-platform assessment of label-free proteomics
    Tian, Sha
    Zhan, Dongdong
    Yu, Ying
    Wang, Yunzhi
    Liu, Mingwei
    Tan, Subei
    Li, Yan
    Song, Lei
    Qin, Zhaoyu
    Li, Xianju
    Liu, Yang
    Li, Yao
    Ji, Shuhui
    Wang, Shanshan
    Zheng, Yuanting
    He, Fuchu
    Qin, Jun
    Ding, Chen
    GENOME BIOLOGY, 2023, 24 (01)
  • [23] Quartet protein reference materials and datasets for multi-platform assessment of label-free proteomics
    Sha Tian
    Dongdong Zhan
    Ying Yu
    Yunzhi Wang
    Mingwei Liu
    Subei Tan
    Yan Li
    Lei Song
    Zhaoyu Qin
    Xianju Li
    Yang Liu
    Yao Li
    Shuhui Ji
    Shanshan Wang
    Yuanting Zheng
    Fuchu He
    Jun Qin
    Chen Ding
    Genome Biology, 24
  • [24] Research on Multi-platform Cooperative Target Recognition Based on Data Mining
    Ni Tian-quan
    Peng Xiao-bing
    Wang Jian-dong
    Liu Yi-an
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2012, 15 (01): : 91 - 104
  • [25] A multi-platform bathyphotometer for fine-scale, coastal bioluminescence research
    Herren, CM
    Haddock, SHD
    Johnson, C
    Orrico, CM
    Moline, MA
    Case, JF
    LIMNOLOGY AND OCEANOGRAPHY-METHODS, 2005, 3 : 247 - 262
  • [26] ITA-ELECTION-2022: A Multi-Platform Dataset of Social Media Conversations Around the 2022 Italian General Election
    Pierri, Francesco
    Liu, Geng
    Ceri, Stefano
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 5386 - 5390
  • [27] Research on Location Algorithm of PD Ultrasonic Based on Global Search of Multi-platform
    Li, Yanqing
    Shi, Lexian
    Yang, Haitao
    Cheng, Shuyi
    Lv, Fangcheng
    2012 ASIA-PACIFIC POWER AND ENERGY ENGINEERING CONFERENCE (APPEEC), 2012,
  • [28] Recruitment in online research for COPD: leveraging social media and research registries
    Paige, Samantha R.
    Krieger, Janice L.
    ERJ OPEN RESEARCH, 2019, 5 (02)
  • [29] Wisdom in Sum of Parts: Multi-Platform Activity Prediction in Social Collaborative Sites
    Lee, Roy Ka-Wei
    Lo, David
    WEBSCI'18: PROCEEDINGS OF THE 10TH ACM CONFERENCE ON WEB SCIENCE, 2018, : 77 - 86
  • [30] Research on Target Searching Strategy Using Mutual Cueing of Multi-sensor in Multi-platform
    Zhang Y.
    He J.
    Tao S.
    Ji W.
    Chen L.
    Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University, 2019, 37 (02): : 308 - 314