How to detect propaganda from social media? Exploitation of semantic and fine-tuned language models

被引：0

作者：

Malik M.S.I. ^{[1
,2
]}

Imran T. ^{[2
]}

Mamdouh J.M. ^{[3
]}

机构：

[1] Department of Computer Science, School of Data Analysis and Artificial Intelligence, Higher School of Economics, Moscow

[2] Department of Computer Science, Capital University of Science and Technology, Islamabad

[3] Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh

来源：

PeerJ Computer Science | 2023年 / 9卷

关键词：

BERT; Binary model; Linguistic; LSA; News articles; Propaganda; Semantic; word2vec;

D O I：

10.7717/PEERJ-CS.1248

中图分类号：

学科分类号：

摘要：

Online propaganda is a mechanism to influence the opinions of social media users. It is a growing menace to public health, democratic institutions, and public society. The present study proposes a propaganda detection framework as a binary classification model based on a news repository. Several feature models are explored to develop a robust model such as part-of-speech, LIWC, word uni-gram, Embeddings from Language Models (ELMo), FastText, word2vec, latent semantic analysis (LSA), and char tri-gram feature models. Moreover, fine-tuning of the BERT is also performed. Three oversampling methods are investigated to handle the imbalance status of the Qprop dataset. SMOTE Edited Nearest Neighbors (ENN) presented the best results. The fine-tuning of BERT revealed that the BERT-320 sequence length is the best model. As a standalone model, the char tri-gram presented superior performance as compared to other features. The robust performance is observed against the combination of char tri-gram + BERT and char tri-gram + word2vec and they outperformed the two state-of-the-art baselines. In contrast to prior approaches, the addition of feature selection further improves the performance and achieved more than 97.60% recall, f1-score, and AUC on the dev and test part of the dataset. The findings of the present study can be used to organize news articles for various public news websites © Copyright 2023 Malik et al.

引用

共 50 条

[31] Stereotype and Skew: Quantifying Gender Bias in Pre-trained and Fine-tuned Language Models
Manela, Daniel de Vassimon
Errington, David
Fisher, Thomas
van Breugel, Boris
Minervini, Pasquale
16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 2232 - 2242
[32] Need of Fine-Tuned Radiology Aware Open-Source Large Language Models for Neuroradiology
Ray, Partha Pratim
CLINICAL NEURORADIOLOGY, 2024,
[33] Comparing Fine-Tuned Transformers and Large Language Models for Sales Call Classification: A Case Study
Eisenstadt, Roy
Asi, Abedelkader
Ronen, Royi
PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 5240 - 5241
[34] Fine-tuned large language models can generate expert-level echocardiography reports
Sowa, Achille
Avram, Robert
EUROPEAN HEART JOURNAL - DIGITAL HEALTH, 2024, 6 (01): : 5 - 6
[35] RankMean: Module-Level Importance Score for Merging Fine-tuned Large Language Models
Perin, Gabriel J.
Chen, Xuxi
Liu, Shusen
Kailkhura, Bhavya
Wang, Zhangyang
Gallagher, Brian
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 1776 - 1782
[36] NDLP Phishing: A Fine-Tuned Application to Detect Phishing Attacks Based on Natural Language Processing and Deep Learning
Benavides-Astudillo E.
Fuertes W.
Sanchez-Gordon S.
Nuñez-Agurto D.
International Journal of Interactive Mobile Technologies, 2024, 18 (10): : 173 - 190
[37] Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback
Tian, Katherine
Mitchell, Eric
Zhou, Allan
Sharma, Archit
Rafailov, Rafael
Yao, Huaxiu
Finn, Chelsea
Manning, Christopher D.
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 5433 - 5442
[38] Assessment of fine-tuned large language models for real-world chemistry and material science applications
Van Herck, Joren
Gil, Maria Victoria
Jablonka, Kevin Maik
Abrudan, Alex
Anker, Andy S.
Asgari, Mehrdad
Blaiszik, Ben
Buffo, Antonio
Choudhury, Leander
Corminboeuf, Clemence
Daglar, Hilal
Elahi, Amir Mohammad
Foster, Ian T.
Garcia, Susana
Garvin, Matthew
Godin, Guillaume
Good, Lydia L.
Gu, Jianan
Xiao Hu, Noemie
Jin, Xin
Junkers, Tanja
Keskin, Seda
Knowles, Tuomas P. J.
Laplaza, Ruben
Lessona, Michele
Majumdar, Sauradeep
Mashhadimoslem, Hossein
Mcintosh, Ruaraidh D.
Moosavi, Seyed Mohamad
Mourino, Beatriz
Nerli, Francesca
Pevida, Covadonga
Poudineh, Neda
Rajabi-Kochi, Mahyar
Saar, Kadi L.
Hooriabad Saboor, Fahimeh
Sagharichiha, Morteza
Schmidt, K. J.
Shi, Jiale
Simone, Elena
Svatunek, Dennis
Taddei, Marco
Tetko, Igor
Tolnai, Domonkos
Vahdatifar, Sahar
Whitmer, Jonathan
Wieland, D. C. Florian
Willumeit-Roemer, Regine
Zuttel, Andreas
Smit, Berend
CHEMICAL SCIENCE, 2025, 16 (02) : 670 - 684
[39] Online aggression detection using ensemble techniques on fine-tuned transformer-based language models
Chinivar S.
Roopa M.S.
Arunalatha J.S.
Venugopal K.R.
International Journal of Computers and Applications, 2024, 46 (08) : 567 - 579
[40] Leveraging fine-tuned Large Language Models with LoRA for Effective Claim, Claimer, and Claim Object Detection
Kotitsas, Sotiris
Kounoudis, Panagiotis
Koutli, Eleni
Papageorgiou, Haris
PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 2540 - 2554

← 1 2 3 4 5 →