The more "similar" the happier: Augmenting text using similarity scoring with neural embeddings for happiness classification

被引:0
|
作者
Kuan Shyang Yong
Jasy Suet Yan Liew
机构
[1] Universiti Sains Malaysia,School of Computer Sciences
来源
Journal of Intelligent Information Systems | 2023年 / 60卷
关键词
Happiness classification; Text augmentation; Sentiment analysis; Deep learning; Similarity scoring; Distant supervision;
D O I
暂无
中图分类号
学科分类号
摘要
Measuring happiness of populations of interest via Twitter offers an alternative for social scientists to gauge the level of happiness in and across different nations but machine learning models are needed to scale happiness classification for millions of tweets. A good performing happiness classifier requires a fair amount of training data with minimal noise. Our study introduces a similarity-based text augmentation method to efficiently expand data for the emotion “happiness” from an existing emotion corpus (EmoTweet-28) by selecting the most similar positive examples from happiness tweets collected using distant supervision (DS) to be added into an augmented corpus as training data. Six neural embeddings on top of the baseline bag-of-words (BoW) representation were explored to compute the cosine similarity score between 100,000 DS tweets with 1,024 gold standard happiness tweets in EmoTweet-28 (ET). Our results show that the augmented training set obtained from USE embedding with the similarity threshold of 0.7 trained on BiLSTM produced the best model in predicting whether a tweet contains expressions of happiness or not (F1 score = 0.599). However, most augmented training sets obtained from InferSent-GloVe embedding produced BiLSTM classifiers with more consistent F1 scores above the base classifier in the fixed increment experiments. We show that our proposed text augmentation strategy can improve or maintain classification performance in small but cleaner increment sets as opposed to adding DS tweets randomly as training data.
引用
收藏
页码:631 / 653
页数:22
相关论文
共 50 条
  • [31] Data Augmentation Using Transformers and Similarity Measures for Improving Arabic Text Classification
    Refai, Dania
    Abu-Soud, Saleh
    Abdel-Rahman, Mohammad J.
    IEEE ACCESS, 2023, 11 : 132516 - 132531
  • [32] Generalized Term Similarity for Feature Selection in Text Classification Using Quadratic Programming
    Lim, Hyunki
    Kim, Dae-Won
    ENTROPY, 2020, 22 (04)
  • [33] Thai Text Detection and Classification Using Convolutional Neural Network
    Malakar, Susanta
    Chiracharit, Werapon
    2020 59TH ANNUAL CONFERENCE OF THE SOCIETY OF INSTRUMENT AND CONTROL ENGINEERS OF JAPAN (SICE), 2020, : 99 - 102
  • [34] Feature Extraction Using Neural Networks for Vietnamese Text Classification
    To Nguyen Phuoc Vinh
    Ha Hoang Kha
    2021 INTERNATIONAL SYMPOSIUM ON ELECTRICAL AND ELECTRONICS ENGINEERING (ISEE 2021), 2021, : 120 - 124
  • [35] Emotion classification in poetry text using deep neural network
    Khattak, Asad
    Asghar, Muhammad Zubair
    Khalid, Hassan Ali
    Ahmad, Hussain
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (18) : 26223 - 26244
  • [36] Emotion classification in poetry text using deep neural network
    Asad Khattak
    Muhammad Zubair Asghar
    Hassan Ali Khalid
    Hussain Ahmad
    Multimedia Tools and Applications, 2022, 81 : 26223 - 26244
  • [37] Document classification using a deep neural network in text mining
    Lee, Bo-Hui
    Lee, Su-Jin
    Choi, Yong-Seok
    KOREAN JOURNAL OF APPLIED STATISTICS, 2020, 33 (05) : 615 - 625
  • [38] A Heterogeneous Directed Graph Attention Network for inductive text classification using multilevel semantic embeddings
    Lin, Mu
    Wang, Tao
    Zhu, Yifan
    Li, Xiaobo
    Zhou, Xin
    Wang, Weiping
    KNOWLEDGE-BASED SYSTEMS, 2024, 295
  • [39] MULTIMODAL DEPRESSION CLASSIFICATION USING ARTICULATORY COORDINATION FEATURES AND HIERARCHICAL ATTENTION BASED TEXT EMBEDDINGS
    Seneviratne, Nadee
    Espy-Wilson, Carol
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6252 - 6256
  • [40] Biomedical Text Similarity Evaluation Using Attention Mechanism and Siamese Neural Network
    Li, Zhengguang
    Chen, Heng
    Chen, Huayue
    IEEE ACCESS, 2021, 9 : 105002 - 105011