The more "similar" the happier: Augmenting text using similarity scoring with neural embeddings for happiness classification

被引:0
|
作者
Kuan Shyang Yong
Jasy Suet Yan Liew
机构
[1] Universiti Sains Malaysia,School of Computer Sciences
来源
Journal of Intelligent Information Systems | 2023年 / 60卷
关键词
Happiness classification; Text augmentation; Sentiment analysis; Deep learning; Similarity scoring; Distant supervision;
D O I
暂无
中图分类号
学科分类号
摘要
Measuring happiness of populations of interest via Twitter offers an alternative for social scientists to gauge the level of happiness in and across different nations but machine learning models are needed to scale happiness classification for millions of tweets. A good performing happiness classifier requires a fair amount of training data with minimal noise. Our study introduces a similarity-based text augmentation method to efficiently expand data for the emotion “happiness” from an existing emotion corpus (EmoTweet-28) by selecting the most similar positive examples from happiness tweets collected using distant supervision (DS) to be added into an augmented corpus as training data. Six neural embeddings on top of the baseline bag-of-words (BoW) representation were explored to compute the cosine similarity score between 100,000 DS tweets with 1,024 gold standard happiness tweets in EmoTweet-28 (ET). Our results show that the augmented training set obtained from USE embedding with the similarity threshold of 0.7 trained on BiLSTM produced the best model in predicting whether a tweet contains expressions of happiness or not (F1 score = 0.599). However, most augmented training sets obtained from InferSent-GloVe embedding produced BiLSTM classifiers with more consistent F1 scores above the base classifier in the fixed increment experiments. We show that our proposed text augmentation strategy can improve or maintain classification performance in small but cleaner increment sets as opposed to adding DS tweets randomly as training data.
引用
收藏
页码:631 / 653
页数:22
相关论文
共 50 条
  • [41] Classification framework to identify similar visual scan paths using multiple similarity metrics
    Fraga, Ricardo Palma
    Kang, Ziho
    Crutchfield, Jerry M.
    JOURNAL OF EYE MOVEMENT RESEARCH, 2024, 17 (03):
  • [42] Similarity-Based Malware Classification Using Graph Neural Networks
    Chen, Yu-Hung
    Chen, Jiann-Liang
    Deng, Ren-Feng
    APPLIED SCIENCES-BASEL, 2022, 12 (21):
  • [43] Store classification using Text-Exemplar-Similarity and Hypotheses-Weighted-CNN
    Huang, Chao
    Li, Hongliang
    Li, Wei
    Wu, Qingbo
    Xu, Linfeng
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2017, 44 : 21 - 28
  • [44] Using boosting mechanism to refine the threshold of VSM-based similarity in text classification
    Diao, LL
    Hu, KY
    Lu, YC
    Shi, CY
    PROCEEDINGS OF THE 4TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-4, 2002, : 2284 - 2287
  • [45] Text classification using similarity of tree sources estimated from Bayes coding algorithm
    Iwama, Hiroki
    Ishida, Takashi
    Goto, Masayuki
    Journal of Japan Industrial Management Association, 2013, 64 (03) : 438 - 446
  • [46] Unsupervised low-dimensional vector representations for words, phrases and text that are transparent, scalable, and produce similarity metrics that are not redundant with neural embeddings
    Smalheiser, Neil R.
    Cohen, Aaron M.
    Bonifield, Gary
    JOURNAL OF BIOMEDICAL INFORMATICS, 2019, 90
  • [47] Short Text Sentiment Classification Using Bayesian and Deep Neural Networks
    Shi, Zhan
    Fan, Chongjun
    ELECTRONICS, 2023, 12 (07)
  • [48] Deep learning classification of biomedical text using convolutional neural network
    Dollah R.
    Sheng C.Y.
    Zakaria N.
    Othman M.S.
    Rasib A.W.
    International Journal of Advanced Computer Science and Applications, 2019, 10 (08): : 512 - 517
  • [49] Leveraging Contextual Sentences for Text Classification by Using a Neural Attention Model
    Yan, DanFeng
    Guo, Shiyao
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2019, 2019
  • [50] Bipartite Graph Coarsening for Text Classification Using Graph Neural Networks
    dos Santos, Nicolas Roque
    Minatel, Diego
    Baria Valejo, Alan Demetrius
    Lopes, Alneu de A.
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2023, PT I, 2024, 14469 : 589 - 604