Negative Stances Detection from Multilingual Data Streams in Low-Resource Languages on Social Media Using BERT and CNN-Based Transfer Learning Model

被引:3
|
作者
Kumar, Sanjay [1 ]
机构
[1] Delhi Technol Univ, Dept Comp Sci & Engn, Main Bawana Rd, Delhi 110042, India
关键词
Bidirectional encoder representations from Transformers (BERT); convolutional neural network (CNN); deep transfer learning; low-resource languages; negative stances detection; machine learning classifier;
D O I
10.1145/3625821
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Online social media allows users to connect with a large number of people across the globe and facilitate the exchange of information efficiently. These platforms cater to many of our day-to-day needs. However, at the same time, social media have been increasingly used to transmit negative stances such as derogatory language, hate speech, and cyberbullying. The task of identifying the negative stances from social media posts or comments or tweets is termed negative stance detection. One of the major challenges associated with negative stance detection is that most of the content published on social media is often in a multilingual format. This work aims to identify negative stances from multilingual data streams in low-resource languages on social media using a hybrid transfer learning and deep convolutional neural network approach. The proposed work starts by preprocessing the multilingual datasets by removing irrelevant information such as special characters and hyperlinks. The processed dataset is then passed through a pretrained BERT (bidirectional encoder representations from Transformers) model to generate embeddings by fine-tuning the model as per the dataset under consideration. The generated word embeddings are then passed to a deep convolutional neural network for extracting the latent features from the texts and removing the unessential information. This helps our model to achieve robustness and effectiveness for efficient learning on the given dataset and make appropriate predictions on zero-shot data. The article utilizes several optimization strategies for examining the impact of fine-tuning different BERT layers on the model's performance. Intensive experiments on a variety of languages - namely, English, French, Italian, Danish, Arabic, Spanish, Indonesian, German, and Portuguese - are performed. The experimental results demonstrate the effectiveness and efficiency of the proposed framework.
引用
收藏
页数:18
相关论文
共 5 条
  • [1] Crosslingual Transfer Learning for Low-Resource Languages Based on Multilingual Colexification Graphs
    Liu, Yihong
    Ye, Haotian
    Weissweiler, Leonie
    Pei, Renhao
    Schuetze, Hinrich
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 8376 - 8401
  • [2] Hierarchical Transfer Learning for Multilingual, Multi-Speaker, and Style Transfer DNN-Based TTS on Low-Resource Languages
    Azizah, Kurniawati
    Adriani, Mirna
    Jatmiko, Wisnu
    IEEE ACCESS, 2020, 8 : 179798 - 179812
  • [3] On the Detection of COVID-19 from Chest X-Ray Images Using CNN-Based Transfer Learning
    Shorfuzzaman, Mohammad
    Masud, Mehedi
    CMC-COMPUTERS MATERIALS & CONTINUA, 2020, 64 (03): : 1359 - 1381
  • [4] TBD3: Thresholding-Based Dynamic Depression Detection from Social Media for Low-Resource Users
    Kulkarni, Hrishikesh
    MacAvaney, Sean
    Goharian, Nazli
    Frieder, Ophir
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 2157 - 2165
  • [5] Explainable Artificial Intelligence-Based IoT Device Malware Detection Mechanism Using Image Visualization and Fine-Tuned CNN-Based Transfer Learning Model
    Naeem, Hamad
    Alshammari, Bandar M.
    Ullah, Farhan
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022