Mining Insights from Large-Scale Corpora Using Fine-Tuned Language Models

被引:11
|
作者
Palakodety, Shriphani [1 ]
KhudaBukhsh, Ashiqur R. [2 ]
Carbonell, Jaime G. [2 ]
机构
[1] Onai, San Jose, CA 95129 USA
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
ELECTION; TWITTER;
D O I
10.3233/FAIA200306
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mining insights from large volume of social media texts with minimal supervision is a highly challenging Natural Language Processing (NLP) task. While Language Models' (LMs) efficacy in several downstream tasks is well-studied, assessing their applicability in answering relational questions, tracking perception or mining deeper insights is under-explored. Few recent lines of work have scratched the surface by studying pre-trained LMs' (e.g., BERT) capability in answering relational questions through "fill-in-the-blank" cloze statements (e.g., [Dante was born in MASK]). BERT predicts the MASK-ed word with a list of words ranked by probability (in this case, BERT successfully predicts Florence with the highest probability). In this paper, we conduct a feasibility study of fine-tuned LMs with a different focus on tracking polls, tracking community perception and mining deeper insights typically obtained through costly surveys. Our main focus is on a substantial corpus of video comments extracted from YouTube videos (6,182,868 comments on 130,067 videos by 1,518,077 users) posted within 100 days prior to the 2019 Indian General Election. Using fill-in-the-blank cloze statements against a recent high-performance language modeling algorithm, BERT, we present a novel application of this family of tools that is able to (1) aggregate political sentiment (2) reveal community perception and (3) track evolving national priorities and issues of interest.
引用
收藏
页码:1890 / 1897
页数:8
相关论文
共 50 条
  • [31] Assessment of fine-tuned large language models for real-world chemistry and material science applications
    Van Herck, Joren
    Gil, Maria Victoria
    Jablonka, Kevin Maik
    Abrudan, Alex
    Anker, Andy S.
    Asgari, Mehrdad
    Blaiszik, Ben
    Buffo, Antonio
    Choudhury, Leander
    Corminboeuf, Clemence
    Daglar, Hilal
    Elahi, Amir Mohammad
    Foster, Ian T.
    Garcia, Susana
    Garvin, Matthew
    Godin, Guillaume
    Good, Lydia L.
    Gu, Jianan
    Xiao Hu, Noemie
    Jin, Xin
    Junkers, Tanja
    Keskin, Seda
    Knowles, Tuomas P. J.
    Laplaza, Ruben
    Lessona, Michele
    Majumdar, Sauradeep
    Mashhadimoslem, Hossein
    Mcintosh, Ruaraidh D.
    Moosavi, Seyed Mohamad
    Mourino, Beatriz
    Nerli, Francesca
    Pevida, Covadonga
    Poudineh, Neda
    Rajabi-Kochi, Mahyar
    Saar, Kadi L.
    Hooriabad Saboor, Fahimeh
    Sagharichiha, Morteza
    Schmidt, K. J.
    Shi, Jiale
    Simone, Elena
    Svatunek, Dennis
    Taddei, Marco
    Tetko, Igor
    Tolnai, Domonkos
    Vahdatifar, Sahar
    Whitmer, Jonathan
    Wieland, D. C. Florian
    Willumeit-Roemer, Regine
    Zuttel, Andreas
    Smit, Berend
    CHEMICAL SCIENCE, 2025, 16 (02) : 670 - 684
  • [32] Fine-Tuned BERT Model for Large Scale and Cognitive Classification of MOOCs
    Sebbaq, Hanane
    El Faddouli, Nour-eddine
    INTERNATIONAL REVIEW OF RESEARCH IN OPEN AND DISTRIBUTED LEARNING, 2022, 23 (02): : 170 - 190
  • [33] Leveraging fine-tuned Large Language Models with LoRA for Effective Claim, Claimer, and Claim Object Detection
    Kotitsas, Sotiris
    Kounoudis, Panagiotis
    Koutli, Eleni
    Papageorgiou, Haris
    PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 2540 - 2554
  • [34] An assessment framework of higher-order thinking skills based on fine-tuned large language models
    Xiao, Xiong
    Li, Yue
    He, Xiuling
    Fang, Jing
    Yan, Zhonghua
    Xie, Chong
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 272
  • [35] How to detect propaganda from social media? Exploitation of semantic and fine-tuned language models
    Malik, Muhammad Shahid Iqbal
    Imran, Tahir
    Mamdouh, Jamjoom Mona
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [36] Small Pre-trained Language Models Can be Fine-tuned as Large Models via Over-Parameterization
    Gao, Ze-Feng
    Zhou, Kun
    Liu, Peiyu
    Zhao, Wayne Xin
    Wen, Ji-Rong
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 3819 - 3834
  • [37] How to detect propaganda from social media? Exploitation of semantic and fine-tuned language models
    Malik M.S.I.
    Imran T.
    Mamdouh J.M.
    PeerJ Computer Science, 2023, 9
  • [38] Discourse Structure Extraction from Pre-Trained and Fine-Tuned Language Models in Dialogues
    Li, Chuyuan
    Huber, Patrick
    Xiao, Wen
    Amblard, Maxime
    Braud, Chloe
    Carenini, Giuseppe
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 2562 - 2579
  • [39] How fine-tuned is a large Muon EDM from Flavor?
    Ruppell, Timo
    SUSY09: THE 17TH INTERNATIONAL CONFERENCE ON SUPERSYMMETRY AND THE UNIFICATION OF FUNDAMENTAL INTERACTIONS, 2009, 1200 : 900 - 903
  • [40] Taiyi: a bilingual fine-tuned large language model for diverse biomedical tasks
    Luo, Ling
    Ning, Jinzhong
    Zhao, Yingwen
    Wang, Zhijun
    Ding, Zeyuan
    Chen, Peng
    Fu, Weiru
    Han, Qinyu
    Xu, Guangtao
    Qiu, Yunzhi
    Pan, Dinghao
    Li, Jiru
    Li, Hao
    Feng, Wenduo
    Tu, Senbo
    Liu, Yuqi
    Yang, Zhihao
    Wang, Jian
    Sun, Yuanyuan
    Lin, Hongfei
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (09) : 1865 - 1874