Topic Modeling Applied to Reddit Posts

被引:0
|
作者
Kedzierska, Maria [1 ]
Spytek, Mikolaj [1 ]
Kurek, Marcelina [1 ]
Sawicki, Jan [1 ]
Ganzha, Maria [1 ]
Paprzycki, Marcin [2 ]
机构
[1] Warsaw Univ Technol, Fac Math & Informat Sci, Koszykowa 75, PL-00662 Warsaw, Mazowieckie, Poland
[2] Polish Acad Sci, Syst Res Inst, Newelska 6, PL-01447 Warsaw, Mazowieckie, Poland
关键词
NLP; text data processing; topic modeling; topic model evaluation; Reddit;
D O I
10.1007/978-3-031-58502-9_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text data is widely used for both commercial and research purposes. While extensive sources of text data are available within Internet forums, such as Reddit, their volume is vast and, typically, only a small subset of posts is studied. To overcame problem of data size, topic modeling can be applied, to extract the main ideas from the documents. However, as it will be shown, different modeling techniques may produce very different results. Specifically, in this contribution, an overview of the most popular topic models, used in natural language processing, and methods for their comparison, is provided. Moreover, a software solution for downloading, modeling, exploring, and comparing topics, contained in Reddit posts, is introduced. The proposed application is experimentally validated, by showing that the extracted topics reflect real-world events. Finally, obtained results are compared to these originating from a different tool, used for investigating topic popularity.
引用
收藏
页码:17 / 44
页数:28
相关论文
共 50 条
  • [21] Affective, cognitive, and contextual cues in Reddit posts on artificial intelligence
    Savela, Nina
    Pellert, Max
    Latikka, Rita
    Bergdahl, Jenna
    Garcia, David
    Oksanen, Atte
    JOURNAL OF COMPUTATIONAL SOCIAL SCIENCE, 2025, 8 (01):
  • [22] Mothers' Worries During Pregnancy: A Content Analysis of Reddit Posts
    Pilkington, Pamela D.
    Bedford-Dyer, Isabella
    JOURNAL OF PERINATAL EDUCATION, 2021, 30 (02): : 98 - 107
  • [23] Identification of mobile development issues using semantic topic modeling of Stack Overflow posts
    Gurcan, Fatih
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [24] Identification of mobile development issues using semantic topic modeling of Stack Overflow posts
    Gurcan, Fatih
    PEERJ COMPUTER SCIENCE, 2023, 9 : 1 - 28
  • [25] Reddit on PrEP: Posts About Pre-exposure Prophylaxis for HIV from Reddit Users, 2014–2019
    Penny S. Loosier
    Kaytlin Renfro
    Monique Carry
    Samantha P. Williams
    Matthew Hogben
    Sevgi Aral
    AIDS and Behavior, 2022, 26 : 1084 - 1094
  • [26] Public Opinions on ChatGPT : An Analysis of Reddit Discussions by Using Sentiment Analysis, Topic Modeling, and SWOT Analysis
    Naing, Shwe Zin Su
    Udomwong, Piyachat
    DATA INTELLIGENCE, 2024, 6 (02) : 344 - 374
  • [27] Stonks to the moon: Evidence from reddit posts and corresponding market manipulation
    Padalkar, Nakul R.
    27th Annual Americas Conference on Information Systems, AMCIS 2021, 2021,
  • [28] Abstractive Summarization of Reddit Posts with Multi-level Memory Networks
    Kim, Byeongchang
    Kim, Hyunwoo
    Kim, Gunhee
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 2519 - 2531
  • [29] The Language of Extremism on Social Media: An Examination of Posts, Comments, and Themes on Reddit
    Hiaeshutter-Rice, Dan
    Hawkins, Ian
    FRONTIERS IN POLITICAL SCIENCE, 2022, 4
  • [30] ALCOHOL AND NON-SUICIDAL SELF-INJURY POSTS ON REDDIT
    Eliseo-Arras, R. K.
    ALCOHOLISM-CLINICAL AND EXPERIMENTAL RESEARCH, 2018, 42 : 144A - 144A