A Word Embedding Model Learned from Political Tweets

被引:0
|
作者
Alnajran, Noufa N. [1 ]
Crockett, Keeley A. [1 ]
McLean, David [1 ]
Latham, Annabel [1 ]
机构
[1] Manchester Metropolitan Univ, Dept Comp Math & Digital Technol, Manchester, Lancs, England
关键词
Word Embedding; Language Modelling; Deep Learning; Social Network Analysis; Twitter Analysis;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Distributed word representations have recently contributed to significant improvements in many natural language processing (NLP) tasks. Distributional semantics have become amongst the important trends in machine learning (ML) applications. Word embeddings are distributed representations of words that learn semantic relationships from a large corpus of text. In the social context, the distributed representation of a word is likely to be different from general text word embeddings. This is relatively due to the unique lexical semantic features and morphological structure of social media text such as tweets, which implies different word vector representations. In this paper, we collect and present a political social dataset that consists of over four million English tweets. An artificial neural network (NN) is trained to learn word co -occurrence and generate word vectors from the political corpus of tweets. The model is 136MB and includes word representations for a vocabulary of over 86K unique words and phrases. The learned model shall contribute to the success of many ML and NLP applications in microblogging Social Network Analysis (OSN), such as semantic similarity and cluster analysis tasks.
引用
收藏
页码:177 / 183
页数:7
相关论文
共 50 条
  • [1] ArWordVec: efficient word embedding models for Arabic tweets
    Fouad, Mohammed M.
    Mahany, Ahmed
    Aljohani, Naif
    Abbasi, Rabeeh Ayaz
    Hassan, Saeed-Ul
    SOFT COMPUTING, 2020, 24 (11) : 8061 - 8068
  • [2] ArWordVec: efficient word embedding models for Arabic tweets
    Mohammed M. Fouad
    Ahmed Mahany
    Naif Aljohani
    Rabeeh Ayaz Abbasi
    Saeed-Ul Hassan
    Soft Computing, 2020, 24 : 8061 - 8068
  • [3] Explainable Emotion Recognition from Tweets using Deep Learning and Word Embedding Models
    Abubakar, Abdulqahar Mukhtar
    Gupta, Deepa
    Palaniswamy, Suja
    2022 IEEE 19TH INDIA COUNCIL INTERNATIONAL CONFERENCE, INDICON, 2022,
  • [4] An Embedding Model for Estimating Legislative Preferences from the Frequency and Sentiment of Tweets
    Spell, Gregory P.
    Guay, Brian
    Hillygus, D. Sunshine
    Carin, Lawrence
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 627 - 641
  • [5] COVID-19 Tweets Classification Based on a Hybrid Word Embedding Method
    Didi, Yosra
    Walha, Ahlam
    Wali, Ali
    BIG DATA AND COGNITIVE COMPUTING, 2022, 6 (02)
  • [6] Detecting Dengue/Flu Infections Based on Tweets Using LSTM and Word Embedding
    Amin, Samina
    Uddin, M. Irfan
    Zeb, M. Ali
    Alarood, Ala Abdulsalam
    Mahmoud, Marwan
    Alkinani, Monagi H.
    IEEE ACCESS, 2020, 8 : 189054 - 189068
  • [7] Contextual Word Embedding: A Case Study in Clustering Tweets about Emergency Situations
    Ganguly, Debasis
    Ghosh, Kripabandhu
    COMPANION PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2018 (WWW 2018), 2018, : 73 - 74
  • [8] Political Ideology Prediction from Bengali Text Using Word Embedding Models
    Tasnim, Zerin
    Ahmed, Shuvo
    Rahman, Atikur
    Sorna, Jannatul Ferdous
    Rahman, Mafizur
    2021 INTERNATIONAL CONFERENCE ON EMERGING SMART COMPUTING AND INFORMATICS (ESCI), 2021, : 724 - 727
  • [9] Identifying tweets of personal health experience through word embedding and LSTM neural network
    Keyuan Jiang
    Shichao Feng
    Qunhao Song
    Ricardo A. Calix
    Matrika Gupta
    Gordon R. Bernard
    BMC Bioinformatics, 19
  • [10] Identifying tweets of personal health experience through word embedding and LSTM neural network
    Jiang, Keyuan
    Feng, Shichao
    Song, Qunhao
    Calix, Ricardo A.
    Gupta, Matrika
    Bernard, Gordon R.
    BMC BIOINFORMATICS, 2018, 19