The more "similar" the happier: Augmenting text using similarity scoring with neural embeddings for happiness classification

被引:0
|
作者
Kuan Shyang Yong
Jasy Suet Yan Liew
机构
[1] Universiti Sains Malaysia,School of Computer Sciences
来源
Journal of Intelligent Information Systems | 2023年 / 60卷
关键词
Happiness classification; Text augmentation; Sentiment analysis; Deep learning; Similarity scoring; Distant supervision;
D O I
暂无
中图分类号
学科分类号
摘要
Measuring happiness of populations of interest via Twitter offers an alternative for social scientists to gauge the level of happiness in and across different nations but machine learning models are needed to scale happiness classification for millions of tweets. A good performing happiness classifier requires a fair amount of training data with minimal noise. Our study introduces a similarity-based text augmentation method to efficiently expand data for the emotion “happiness” from an existing emotion corpus (EmoTweet-28) by selecting the most similar positive examples from happiness tweets collected using distant supervision (DS) to be added into an augmented corpus as training data. Six neural embeddings on top of the baseline bag-of-words (BoW) representation were explored to compute the cosine similarity score between 100,000 DS tweets with 1,024 gold standard happiness tweets in EmoTweet-28 (ET). Our results show that the augmented training set obtained from USE embedding with the similarity threshold of 0.7 trained on BiLSTM produced the best model in predicting whether a tweet contains expressions of happiness or not (F1 score = 0.599). However, most augmented training sets obtained from InferSent-GloVe embedding produced BiLSTM classifiers with more consistent F1 scores above the base classifier in the fixed increment experiments. We show that our proposed text augmentation strategy can improve or maintain classification performance in small but cleaner increment sets as opposed to adding DS tweets randomly as training data.
引用
收藏
页码:631 / 653
页数:22
相关论文
共 50 条
  • [21] Text classification using similarity measures on intuitionistic fuzzy sets
    Intarapaiboon, Peerasak
    SCIENCEASIA, 2016, 42 (01): : 52 - 60
  • [22] Deep text classification of Instagram data using word embeddings and weak supervision
    Hammar, Kim
    Jaradat, Shatha
    Dokoohaki, Nima
    Matskin, Mihhail
    WEB INTELLIGENCE, 2020, 18 (01) : 53 - 67
  • [23] Emotion Detection from Text via Ensemble Classification Using Word Embeddings
    Herzig, Jonathan
    Shmueli-Scheuer, Michal
    Konopnicki, David
    ICTIR'17: PROCEEDINGS OF THE 2017 ACM SIGIR INTERNATIONAL CONFERENCE THEORY OF INFORMATION RETRIEVAL, 2017, : 269 - 272
  • [24] Semantic Text Encoding for Text Classification using Convolutional Neural Networks
    Gallo, Ignazio
    Nawaz, Shah
    Calefati, Alessandro
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2017), VOL 5, 2017, : 16 - 21
  • [25] Enhancing Short Text Semantic Similarity Measurement Using Pretrained Word Embeddings and Big Data
    Jinarat, Supakpong
    Pruengkarn, Ratchakoon
    2024 5TH INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS AND PRACTICES, IBDAP, 2024, : 63 - 66
  • [26] Automatic text classification using an artificial neural network
    de Mello, RF
    Senger, LJ
    Yang, LT
    HIGH PERFORMANCE COMPUTATIONAL SCIENCE AND ENGINEERING, 2004, 172 : 215 - +
  • [27] Job Demand Estimation Using Text Embeddings of Patent Classification Codes and Occupational Data
    Ha, Taehyun
    Moon, Ahram
    IEEE ACCESS, 2025, 13 : 34854 - 34864
  • [28] Semantic Role Labeling for Amharic Text Using Multiple Embeddings and Deep Neural Network
    Hailu, Bemnet Meresa
    Assabie, Yaregal
    Sinshaw, Yenewondim Biadgie
    IEEE ACCESS, 2023, 11 : 33274 - 33295
  • [29] Unsupervised Evaluation of Text Co-clustering Algorithms Using Neural Word Embeddings
    Role, Francois
    Morbieu, Stanislas
    Nadif, Mohamed
    CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 1827 - 1830
  • [30] Classification of RSS-formatted documents using full text similarity measures
    Wegrzyn-Wolska, K
    Szczepaniak, PS
    WEB ENGINEERING, PROCEEDINGS, 2005, 3579 : 400 - 405