How to detect propaganda from social media? Exploitation of semantic and fine-tuned language models

被引:0
|
作者
Malik M.S.I. [1 ,2 ]
Imran T. [2 ]
Mamdouh J.M. [3 ]
机构
[1] Department of Computer Science, School of Data Analysis and Artificial Intelligence, Higher School of Economics, Moscow
[2] Department of Computer Science, Capital University of Science and Technology, Islamabad
[3] Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh
关键词
BERT; Binary model; Linguistic; LSA; News articles; Propaganda; Semantic; word2vec;
D O I
10.7717/PEERJ-CS.1248
中图分类号
学科分类号
摘要
Online propaganda is a mechanism to influence the opinions of social media users. It is a growing menace to public health, democratic institutions, and public society. The present study proposes a propaganda detection framework as a binary classification model based on a news repository. Several feature models are explored to develop a robust model such as part-of-speech, LIWC, word uni-gram, Embeddings from Language Models (ELMo), FastText, word2vec, latent semantic analysis (LSA), and char tri-gram feature models. Moreover, fine-tuning of the BERT is also performed. Three oversampling methods are investigated to handle the imbalance status of the Qprop dataset. SMOTE Edited Nearest Neighbors (ENN) presented the best results. The fine-tuning of BERT revealed that the BERT-320 sequence length is the best model. As a standalone model, the char tri-gram presented superior performance as compared to other features. The robust performance is observed against the combination of char tri-gram + BERT and char tri-gram + word2vec and they outperformed the two state-of-the-art baselines. In contrast to prior approaches, the addition of feature selection further improves the performance and achieved more than 97.60% recall, f1-score, and AUC on the dev and test part of the dataset. The findings of the present study can be used to organize news articles for various public news websites © Copyright 2023 Malik et al.
引用
收藏
相关论文
共 50 条
  • [31] Stereotype and Skew: Quantifying Gender Bias in Pre-trained and Fine-tuned Language Models
    Manela, Daniel de Vassimon
    Errington, David
    Fisher, Thomas
    van Breugel, Boris
    Minervini, Pasquale
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 2232 - 2242
  • [32] Need of Fine-Tuned Radiology Aware Open-Source Large Language Models for Neuroradiology
    Ray, Partha Pratim
    CLINICAL NEURORADIOLOGY, 2024,
  • [33] Comparing Fine-Tuned Transformers and Large Language Models for Sales Call Classification: A Case Study
    Eisenstadt, Roy
    Asi, Abedelkader
    Ronen, Royi
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 5240 - 5241
  • [34] Fine-tuned large language models can generate expert-level echocardiography reports
    Sowa, Achille
    Avram, Robert
    EUROPEAN HEART JOURNAL - DIGITAL HEALTH, 2024, 6 (01): : 5 - 6
  • [35] RankMean: Module-Level Importance Score for Merging Fine-tuned Large Language Models
    Perin, Gabriel J.
    Chen, Xuxi
    Liu, Shusen
    Kailkhura, Bhavya
    Wang, Zhangyang
    Gallagher, Brian
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 1776 - 1782
  • [36] NDLP Phishing: A Fine-Tuned Application to Detect Phishing Attacks Based on Natural Language Processing and Deep Learning
    Benavides-Astudillo E.
    Fuertes W.
    Sanchez-Gordon S.
    Nuñez-Agurto D.
    International Journal of Interactive Mobile Technologies, 2024, 18 (10): : 173 - 190
  • [37] Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback
    Tian, Katherine
    Mitchell, Eric
    Zhou, Allan
    Sharma, Archit
    Rafailov, Rafael
    Yao, Huaxiu
    Finn, Chelsea
    Manning, Christopher D.
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 5433 - 5442
  • [38] Assessment of fine-tuned large language models for real-world chemistry and material science applications
    Van Herck, Joren
    Gil, Maria Victoria
    Jablonka, Kevin Maik
    Abrudan, Alex
    Anker, Andy S.
    Asgari, Mehrdad
    Blaiszik, Ben
    Buffo, Antonio
    Choudhury, Leander
    Corminboeuf, Clemence
    Daglar, Hilal
    Elahi, Amir Mohammad
    Foster, Ian T.
    Garcia, Susana
    Garvin, Matthew
    Godin, Guillaume
    Good, Lydia L.
    Gu, Jianan
    Xiao Hu, Noemie
    Jin, Xin
    Junkers, Tanja
    Keskin, Seda
    Knowles, Tuomas P. J.
    Laplaza, Ruben
    Lessona, Michele
    Majumdar, Sauradeep
    Mashhadimoslem, Hossein
    Mcintosh, Ruaraidh D.
    Moosavi, Seyed Mohamad
    Mourino, Beatriz
    Nerli, Francesca
    Pevida, Covadonga
    Poudineh, Neda
    Rajabi-Kochi, Mahyar
    Saar, Kadi L.
    Hooriabad Saboor, Fahimeh
    Sagharichiha, Morteza
    Schmidt, K. J.
    Shi, Jiale
    Simone, Elena
    Svatunek, Dennis
    Taddei, Marco
    Tetko, Igor
    Tolnai, Domonkos
    Vahdatifar, Sahar
    Whitmer, Jonathan
    Wieland, D. C. Florian
    Willumeit-Roemer, Regine
    Zuttel, Andreas
    Smit, Berend
    CHEMICAL SCIENCE, 2025, 16 (02) : 670 - 684
  • [39] Online aggression detection using ensemble techniques on fine-tuned transformer-based language models
    Chinivar S.
    Roopa M.S.
    Arunalatha J.S.
    Venugopal K.R.
    International Journal of Computers and Applications, 2024, 46 (08) : 567 - 579
  • [40] Leveraging fine-tuned Large Language Models with LoRA for Effective Claim, Claimer, and Claim Object Detection
    Kotitsas, Sotiris
    Kounoudis, Panagiotis
    Koutli, Eleni
    Papageorgiou, Haris
    PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 2540 - 2554