Topic Modeling Applied to Reddit Posts

被引:0
|
作者
Kedzierska, Maria [1 ]
Spytek, Mikolaj [1 ]
Kurek, Marcelina [1 ]
Sawicki, Jan [1 ]
Ganzha, Maria [1 ]
Paprzycki, Marcin [2 ]
机构
[1] Warsaw Univ Technol, Fac Math & Informat Sci, Koszykowa 75, PL-00662 Warsaw, Mazowieckie, Poland
[2] Polish Acad Sci, Syst Res Inst, Newelska 6, PL-01447 Warsaw, Mazowieckie, Poland
关键词
NLP; text data processing; topic modeling; topic model evaluation; Reddit;
D O I
10.1007/978-3-031-58502-9_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text data is widely used for both commercial and research purposes. While extensive sources of text data are available within Internet forums, such as Reddit, their volume is vast and, typically, only a small subset of posts is studied. To overcame problem of data size, topic modeling can be applied, to extract the main ideas from the documents. However, as it will be shown, different modeling techniques may produce very different results. Specifically, in this contribution, an overview of the most popular topic models, used in natural language processing, and methods for their comparison, is provided. Moreover, a software solution for downloading, modeling, exploring, and comparing topics, contained in Reddit posts, is introduced. The proposed application is experimentally validated, by showing that the extracted topics reflect real-world events. Finally, obtained results are compared to these originating from a different tool, used for investigating topic popularity.
引用
收藏
页码:17 / 44
页数:28
相关论文
共 50 条
  • [1] Applied Behavior Analysis as Treatment for Autism Spectrum Disorders: Topic Modeling and Linguistic Analysis of Reddit Posts
    Bellon-Harn, Monica L.
    Boyd, Ryan L.
    Manchaiah, Vinaya
    FRONTIERS IN REHABILITATION SCIENCES, 2022, 2
  • [2] Modeling, Evaluating, and Applying the eWoM Power of Reddit Posts
    Bonifazi, Gianluca
    Corradini, Enrico
    Ursino, Domenico
    Virgili, Luca
    BIG DATA AND COGNITIVE COMPUTING, 2023, 7 (01)
  • [3] Understanding Weight Loss via Online Discussions: Content Analysis of Reddit Posts Using Topic Modeling and Word Clustering Techniques
    Liu, Yang
    Yin, Zhijun
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2020, 22 (06)
  • [4] Geographical aggregation of microblog posts for LDA topic modeling
    Lopez-Ramirez, Pablo
    Molina-Villegas, Alejandro
    Siordia, Oscar S.
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 36 (05) : 4901 - 4908
  • [5] Investigating the phenomenon of NSFW posts in Reddit
    Corradini, Enrico
    Nocera, Antonino
    Ursino, Domenico
    Virgili, Luca
    INFORMATION SCIENCES, 2021, 566 : 140 - 164
  • [6] Determining PolyCystic Ovarian Syndrome Severity from Reddit Posts using Topic Modelling and Association Rule Mining
    Selvaraj, Santhi
    Sundaradhas, Selva Nidhyananthan
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2024, 21 (03) : 439 - 457
  • [7] Understanding Anonymous Social Media Posts using Topic Modeling
    Valencia, John Daniel M.
    Laure, Al Joseph T.
    Centino, Nino Mark R.
    Fabito, Bernie S.
    Imperial, Joseph Marvin R.
    Rodriguez, Ramon L.
    De la Cruz, Angelica H.
    Octaviano, Manolito, V
    Jamis, Marilou N.
    2019 IEEE 11TH INTERNATIONAL CONFERENCE ON HUMANOID, NANOTECHNOLOGY, INFORMATION TECHNOLOGY, COMMUNICATION AND CONTROL, ENVIRONMENT, AND MANAGEMENT (HNICEM), 2019,
  • [8] Topic modeling with latent Dirichlet allocation for cancer disease posts
    Altintas, Volkan
    Albayrak, Mehmet
    Topal, Kamil
    JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY, 2021, 36 (04): : 2183 - 2196
  • [9] Subevents detection through topic modeling in social media posts
    Nolasco, Diogo
    Oliveira, Jonice
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 93 : 290 - 303
  • [10] Toxic Relationships Described by People With Breast Cancer on Reddit: Topic Modeling Study
    Davidson, Cara Anne
    Booth, Richard
    Jackson, Kimberley Teresa
    Mantler, Tara
    JMIR CANCER, 2024, 10