Negative Stances Detection from Multilingual Data Streams in Low-Resource Languages on Social Media Using BERT and CNN-Based Transfer Learning Model

被引：3

作者：

Kumar, Sanjay ^{[1
]}

机构：

[1] Delhi Technol Univ, Dept Comp Sci & Engn, Main Bawana Rd, Delhi 110042, India

来源：

ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING | 2024年 / 23卷 / 01期

关键词：

Bidirectional encoder representations from Transformers (BERT); convolutional neural network (CNN); deep transfer learning; low-resource languages; negative stances detection; machine learning classifier;

D O I：

10.1145/3625821

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Online social media allows users to connect with a large number of people across the globe and facilitate the exchange of information efficiently. These platforms cater to many of our day-to-day needs. However, at the same time, social media have been increasingly used to transmit negative stances such as derogatory language, hate speech, and cyberbullying. The task of identifying the negative stances from social media posts or comments or tweets is termed negative stance detection. One of the major challenges associated with negative stance detection is that most of the content published on social media is often in a multilingual format. This work aims to identify negative stances from multilingual data streams in low-resource languages on social media using a hybrid transfer learning and deep convolutional neural network approach. The proposed work starts by preprocessing the multilingual datasets by removing irrelevant information such as special characters and hyperlinks. The processed dataset is then passed through a pretrained BERT (bidirectional encoder representations from Transformers) model to generate embeddings by fine-tuning the model as per the dataset under consideration. The generated word embeddings are then passed to a deep convolutional neural network for extracting the latent features from the texts and removing the unessential information. This helps our model to achieve robustness and effectiveness for efficient learning on the given dataset and make appropriate predictions on zero-shot data. The article utilizes several optimization strategies for examining the impact of fine-tuning different BERT layers on the model's performance. Intensive experiments on a variety of languages - namely, English, French, Italian, Danish, Arabic, Spanish, Indonesian, German, and Portuguese - are performed. The experimental results demonstrate the effectiveness and efficiency of the proposed framework.

引用

页数：18

共 5 条

[1] Crosslingual Transfer Learning for Low-Resource Languages Based on Multilingual Colexification Graphs
Liu, Yihong
Ye, Haotian
Weissweiler, Leonie
Pei, Renhao
Schuetze, Hinrich
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 8376 - 8401
[2] Hierarchical Transfer Learning for Multilingual, Multi-Speaker, and Style Transfer DNN-Based TTS on Low-Resource Languages
Azizah, Kurniawati
Adriani, Mirna
Jatmiko, Wisnu
IEEE ACCESS, 2020, 8 : 179798 - 179812
[3] On the Detection of COVID-19 from Chest X-Ray Images Using CNN-Based Transfer Learning
Shorfuzzaman, Mohammad
Masud, Mehedi
CMC-COMPUTERS MATERIALS & CONTINUA, 2020, 64 (03): : 1359 - 1381
[4] TBD3: Thresholding-Based Dynamic Depression Detection from Social Media for Low-Resource Users
Kulkarni, Hrishikesh
MacAvaney, Sean
Goharian, Nazli
Frieder, Ophir
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 2157 - 2165
[5] Explainable Artificial Intelligence-Based IoT Device Malware Detection Mechanism Using Image Visualization and Fine-Tuned CNN-Based Transfer Learning Model
Naeem, Hamad
Alshammari, Bandar M.
Ullah, Farhan
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022

← 1 →