The more "similar" the happier: Augmenting text using similarity scoring with neural embeddings for happiness classification

被引：0

作者：

Kuan Shyang Yong

Jasy Suet Yan Liew

机构：

[1] Universiti Sains Malaysia,School of Computer Sciences

来源：

Journal of Intelligent Information Systems | 2023年 / 60卷

关键词：

Happiness classification; Text augmentation; Sentiment analysis; Deep learning; Similarity scoring; Distant supervision;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Measuring happiness of populations of interest via Twitter offers an alternative for social scientists to gauge the level of happiness in and across different nations but machine learning models are needed to scale happiness classification for millions of tweets. A good performing happiness classifier requires a fair amount of training data with minimal noise. Our study introduces a similarity-based text augmentation method to efficiently expand data for the emotion “happiness” from an existing emotion corpus (EmoTweet-28) by selecting the most similar positive examples from happiness tweets collected using distant supervision (DS) to be added into an augmented corpus as training data. Six neural embeddings on top of the baseline bag-of-words (BoW) representation were explored to compute the cosine similarity score between 100,000 DS tweets with 1,024 gold standard happiness tweets in EmoTweet-28 (ET). Our results show that the augmented training set obtained from USE embedding with the similarity threshold of 0.7 trained on BiLSTM produced the best model in predicting whether a tweet contains expressions of happiness or not (F1 score = 0.599). However, most augmented training sets obtained from InferSent-GloVe embedding produced BiLSTM classifiers with more consistent F1 scores above the base classifier in the fixed increment experiments. We show that our proposed text augmentation strategy can improve or maintain classification performance in small but cleaner increment sets as opposed to adding DS tweets randomly as training data.

引用

页码：631 / 653

页数：22

共 50 条

[21] Text classification using similarity measures on intuitionistic fuzzy sets
Intarapaiboon, Peerasak
SCIENCEASIA, 2016, 42 (01): : 52 - 60
[22] Deep text classification of Instagram data using word embeddings and weak supervision
Hammar, Kim
Jaradat, Shatha
Dokoohaki, Nima
Matskin, Mihhail
WEB INTELLIGENCE, 2020, 18 (01) : 53 - 67
[23] Emotion Detection from Text via Ensemble Classification Using Word Embeddings
Herzig, Jonathan
Shmueli-Scheuer, Michal
Konopnicki, David
ICTIR'17: PROCEEDINGS OF THE 2017 ACM SIGIR INTERNATIONAL CONFERENCE THEORY OF INFORMATION RETRIEVAL, 2017, : 269 - 272
[24] Semantic Text Encoding for Text Classification using Convolutional Neural Networks
Gallo, Ignazio
Nawaz, Shah
Calefati, Alessandro
2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2017), VOL 5, 2017, : 16 - 21
[25] Enhancing Short Text Semantic Similarity Measurement Using Pretrained Word Embeddings and Big Data
Jinarat, Supakpong
Pruengkarn, Ratchakoon
2024 5TH INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS AND PRACTICES, IBDAP, 2024, : 63 - 66
[26] Automatic text classification using an artificial neural network
de Mello, RF
Senger, LJ
Yang, LT
HIGH PERFORMANCE COMPUTATIONAL SCIENCE AND ENGINEERING, 2004, 172 : 215 - +
[27] Job Demand Estimation Using Text Embeddings of Patent Classification Codes and Occupational Data
Ha, Taehyun
Moon, Ahram
IEEE ACCESS, 2025, 13 : 34854 - 34864
[28] Semantic Role Labeling for Amharic Text Using Multiple Embeddings and Deep Neural Network
Hailu, Bemnet Meresa
Assabie, Yaregal
Sinshaw, Yenewondim Biadgie
IEEE ACCESS, 2023, 11 : 33274 - 33295
[29] Unsupervised Evaluation of Text Co-clustering Algorithms Using Neural Word Embeddings
Role, Francois
Morbieu, Stanislas
Nadif, Mohamed
CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 1827 - 1830
[30] Classification of RSS-formatted documents using full text similarity measures
Wegrzyn-Wolska, K
Szczepaniak, PS
WEB ENGINEERING, PROCEEDINGS, 2005, 3579 : 400 - 405

← 1 2 3 4 5 →