The more "similar" the happier: Augmenting text using similarity scoring with neural embeddings for happiness classification

被引:0
|
作者
Kuan Shyang Yong
Jasy Suet Yan Liew
机构
[1] Universiti Sains Malaysia,School of Computer Sciences
来源
Journal of Intelligent Information Systems | 2023年 / 60卷
关键词
Happiness classification; Text augmentation; Sentiment analysis; Deep learning; Similarity scoring; Distant supervision;
D O I
暂无
中图分类号
学科分类号
摘要
Measuring happiness of populations of interest via Twitter offers an alternative for social scientists to gauge the level of happiness in and across different nations but machine learning models are needed to scale happiness classification for millions of tweets. A good performing happiness classifier requires a fair amount of training data with minimal noise. Our study introduces a similarity-based text augmentation method to efficiently expand data for the emotion “happiness” from an existing emotion corpus (EmoTweet-28) by selecting the most similar positive examples from happiness tweets collected using distant supervision (DS) to be added into an augmented corpus as training data. Six neural embeddings on top of the baseline bag-of-words (BoW) representation were explored to compute the cosine similarity score between 100,000 DS tweets with 1,024 gold standard happiness tweets in EmoTweet-28 (ET). Our results show that the augmented training set obtained from USE embedding with the similarity threshold of 0.7 trained on BiLSTM produced the best model in predicting whether a tweet contains expressions of happiness or not (F1 score = 0.599). However, most augmented training sets obtained from InferSent-GloVe embedding produced BiLSTM classifiers with more consistent F1 scores above the base classifier in the fixed increment experiments. We show that our proposed text augmentation strategy can improve or maintain classification performance in small but cleaner increment sets as opposed to adding DS tweets randomly as training data.
引用
收藏
页码:631 / 653
页数:22
相关论文
共 50 条
  • [1] The more "similar" the happier: Augmenting text using similarity scoring with neural embeddings for happiness classification
    Yong, Kuan Shyang
    Liew, Jasy Suet Yan
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2023, 60 (03) : 631 - 653
  • [2] Text classification using embeddings: a survey
    Liliane Soares da Costa
    Italo L. Oliveira
    Renato Fileto
    Knowledge and Information Systems, 2023, 65 : 2761 - 2803
  • [3] Text classification using embeddings: a survey
    da Costa, Liliane Soares
    Oliveira, Italo L.
    Fileto, Renato
    KNOWLEDGE AND INFORMATION SYSTEMS, 2023, 65 (07) : 2761 - 2803
  • [4] Text Classification Using Word Embeddings
    Helaskar, Mukund N.
    Sonawane, Sheetal S.
    2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2019,
  • [5] Text sentiment classification of Amazon reviews using word embeddings and convolutional neural networks
    Mohammed Qorich
    Rajae El Ouazzani
    The Journal of Supercomputing, 2023, 79 : 11029 - 11054
  • [6] Text sentiment classification of Amazon reviews using word embeddings and convolutional neural networks
    Qorich, Mohammed
    El Ouazzani, Rajae
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (10): : 11029 - 11054
  • [7] An analysis of hierarchical text classification using word embeddings
    Stein, Roger Alan
    Jaques, Patricia A.
    Valiati, Joao Francisco
    INFORMATION SCIENCES, 2019, 471 : 216 - 232
  • [8] A Neural Network Approach for Text Classification Using Low Dimensional Joint Embeddings of Words and Knowledge
    da Costa, Liliane Soares
    Oliveira, Italo Lopes
    Fileto, Renato
    INFORMATION INTEGRATION AND WEB INTELLIGENCE, IIWAS 2022, 2022, 13635 : 181 - 194
  • [9] Multilabeled Emotions Classification in Software Engineering Text Using Convolutional Neural Networks and Word Embeddings
    Wagan, Atif Ali
    Li, Shuaiyong
    JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2025, 37 (03)
  • [10] Automatic Text Scoring Using Neural Networks
    Alikaniotis, Dimitrios
    Yannakoudakis, Helen
    Rei, Marek
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 715 - 725