Mining Insights from Large-Scale Corpora Using Fine-Tuned Language Models

被引:11
|
作者
Palakodety, Shriphani [1 ]
KhudaBukhsh, Ashiqur R. [2 ]
Carbonell, Jaime G. [2 ]
机构
[1] Onai, San Jose, CA 95129 USA
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
ELECTION; TWITTER;
D O I
10.3233/FAIA200306
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mining insights from large volume of social media texts with minimal supervision is a highly challenging Natural Language Processing (NLP) task. While Language Models' (LMs) efficacy in several downstream tasks is well-studied, assessing their applicability in answering relational questions, tracking perception or mining deeper insights is under-explored. Few recent lines of work have scratched the surface by studying pre-trained LMs' (e.g., BERT) capability in answering relational questions through "fill-in-the-blank" cloze statements (e.g., [Dante was born in MASK]). BERT predicts the MASK-ed word with a list of words ranked by probability (in this case, BERT successfully predicts Florence with the highest probability). In this paper, we conduct a feasibility study of fine-tuned LMs with a different focus on tracking polls, tracking community perception and mining deeper insights typically obtained through costly surveys. Our main focus is on a substantial corpus of video comments extracted from YouTube videos (6,182,868 comments on 130,067 videos by 1,518,077 users) posted within 100 days prior to the 2019 Indian General Election. Using fill-in-the-blank cloze statements against a recent high-performance language modeling algorithm, BERT, we present a novel application of this family of tools that is able to (1) aggregate political sentiment (2) reveal community perception and (3) track evolving national priorities and issues of interest.
引用
收藏
页码:1890 / 1897
页数:8
相关论文
共 50 条
  • [1] Towards Understanding Large-Scale Discourse Structures in Pre-Trained and Fine-Tuned Language Models
    Huber, Patrick
    Carenini, Giuseppe
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 2376 - 2394
  • [2] Automated Smart Contract Vulnerability Detection using Fine-tuned Large Language Models
    Yang, Zhiju
    Man, Gaoyuan
    Yue, Songqing
    6TH INTERNATIONAL CONFERENCE ON BLOCKCHAIN TECHNOLOGY AND APPLICATIONS, ICBTA 2023, 2023, : 19 - 23
  • [3] Automated classification of brain MRI reports using fine-tuned large language models
    Kanzawa, Jun
    Yasaka, Koichiro
    Fujita, Nana
    Fujiwara, Shin
    Abe, Osamu
    NEURORADIOLOGY, 2024, 66 (12) : 2177 - 2183
  • [4] Generating Software Tests for Mobile Applications Using Fine-Tuned Large Language Models
    Hoffmann, Jacob
    Frister, Demian
    PROCEEDINGS OF THE 2024 IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATION OF SOFTWARE TEST, AST 2024, 2024, : 76 - 77
  • [5] Exploring Memorization in Fine-tuned Language Models
    Zeng, Shenglai
    Li, Yaxin
    Ren, Jie
    Liu, Yiding
    Xu, Han
    He, Pengfei
    Xing, Yue
    Wang, Shuaiqiang
    Tang, Jiliang
    Yin, Dawei
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 3917 - 3948
  • [6] Fingerprinting Fine-tuned Language Models in the Wild
    Diwan, Nirav
    Chakravorty, Tanmoy
    Shafiq, Zubair
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4652 - 4664
  • [7] Deciphering language disturbances in schizophrenia: A study using fine-tuned language models
    Li, Renyu
    Cao, Minne
    Fu, Dawei
    Wei, Wei
    Wang, Dequan
    Yuan, Zhaoxia
    Hu, Ruofei
    Deng, Wei
    SCHIZOPHRENIA RESEARCH, 2024, 271 : 120 - 128
  • [8] Using fine-tuned large language models to parse clinical notes in musculoskeletal pain disorders
    Vaid, Akhil
    Landi, Isotta
    Nadkarni, Girish
    Nabeel, Ismail
    LANCET DIGITAL HEALTH, 2023, 5 (12): : E855 - E858
  • [9] LogFiT: Log Anomaly Detection Using Fine-Tuned Language Models
    Almodovar, Crispin
    Sabrina, Fariza
    Karimi, Sarvnaz
    Azad, Salahuddin
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2024, 21 (02): : 1715 - 1723
  • [10] LARGE SCALE FINE-TUNED TRANSFORMERS MODELS APPLICATION FOR BUSINESS NAMES GENERATION
    Lukauskas, Mantas
    Rasymas, Tomas
    Minelga, Matas
    Vaitmonas, Domas
    COMPUTING AND INFORMATICS, 2023, 42 (03) : 525 - 545