Leveraging GPT for the Generation of Multi-Platform Social Media Datasets for Research

被引:0
|
作者
Tari, Henry [1 ]
Khan, M. Danial [1 ]
Rutten, Justus [1 ]
Othman, Darian [2 ]
Kaushal, Rishabh [1 ,3 ,4 ]
Bertaglia, Thales [5 ]
Iamnitchi, Adriana [1 ,3 ]
机构
[1] Maastricht Univ, Dept Adv Comp Sci, Maastricht, Netherlands
[2] Maastricht Univ, Dept Data Analyt & Digitalisat, Maastricht, Netherlands
[3] Maastricht Univ, Inst Data Sci, Maastricht, Netherlands
[4] Indira Gandhi Delhi Tech Univ Women, Dept Informat Technol, Delhi, India
[5] Univ Utrecht, Utrecht, Netherlands
关键词
Social Media Research; LLMs; Synthetic Data;
D O I
10.1145/3648188.3675153
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Social media datasets are essential for research on disinformation, influence operations, social sensing, hate speech detection, cyber-bullying, and other significant topics. However, access to these datasets is often restricted due to costs and platform regulations. As such, acquiring datasets that span multiple platforms which are crucial for a comprehensive understanding of the digital ecosystem is particularly challenging. This paper explores the potential of large language models to create lexically and semantically relevant social media datasets across multiple platforms, aiming to match the quality of real datasets. We employ ChatGPT to generate synthetic data from a real dataset consisting of posts from three different social media platforms. We assess the lexical and semantic properties of the synthetic data and compare them with those of the real data. Our empirical findings suggest that using large language models to generate synthetic multi-platform social media data is promising. However, further enhancements are necessary to improve the fidelity of the outputs.
引用
收藏
页码:337 / 343
页数:7
相关论文
共 50 条
  • [1] A multi-platform dataset for detecting cyberbullying in social media
    Van Bruwaene, David
    Huang, Qianjia
    Inkpen, Diana
    LANGUAGE RESOURCES AND EVALUATION, 2020, 54 (04) : 851 - 874
  • [2] A multi-platform dataset for detecting cyberbullying in social media
    David Van Bruwaene
    Qianjia Huang
    Diana Inkpen
    Language Resources and Evaluation, 2020, 54 : 851 - 874
  • [3] Viral voices: A multi-platform analysis of tonsillectomy on social media
    Rossi, Nicholas A.
    Benavidez, Mia
    Nuti, Shiva A.
    Hajiyev, Yusif
    Hughes, Charles A.
    Pine, Harold S.
    INTERNATIONAL JOURNAL OF PEDIATRIC OTORHINOLARYNGOLOGY, 2024, 176
  • [4] Nonlinear spreading behavior across multi-platform social media universe
    Xia, Chenkai
    Johnson, Neil F.
    CHAOS, 2024, 34 (04)
  • [6] Multi-platform media and the miracle of the loaves and fishes
    Doyle, Gillian
    JOURNAL OF MEDIA BUSINESS STUDIES, 2015, 12 (01) : 49 - 65
  • [7] Coordinating a Multi-Platform Disinformation Campaign: Internet Research Agency Activity on Three US Social Media Platforms, 2015 to 2017
    Lukito, Josephine
    POLITICAL COMMUNICATION, 2020, 37 (02) : 238 - 255
  • [8] HBOC and Lynch syndrome: AI-powered multi-platform analysis of social media activity
    Kalra, A.
    Gootzen, T.
    Fierheller, C.
    Sarig, K.
    Pan, Y.
    Parmar, A.
    Papalois, K. -B.
    Samuels, A.
    Sideris, M.
    Oxley, S.
    Sia, J.
    Ganesan, S.
    Legood, R.
    Munblit, D.
    Blyuss, O.
    Manchanda, R.
    BJOG-AN INTERNATIONAL JOURNAL OF OBSTETRICS AND GYNAECOLOGY, 2024, 131 : 32 - 32
  • [9] Multi-Platform Human Computer Interaction in Converged Media Spaces
    Robison, D.
    Palmer, I. J.
    Excell, P. S.
    Earnshaw, R. A.
    Salem, O. Al Sheikh
    2009 INTERNATIONAL CONFERENCE ON CYBERWORLDS, 2009, : 279 - +
  • [10] Reaching for the stars: DingTalk and the Multi-platform creativity of a 'one-star' campaign on Chinese social media
    Wu, Xiaoping
    Fitzgerald, Richard
    DISCOURSE CONTEXT & MEDIA, 2021, 44