A Word Embedding Model Learned from Political Tweets

被引:0
|
作者
Alnajran, Noufa N. [1 ]
Crockett, Keeley A. [1 ]
McLean, David [1 ]
Latham, Annabel [1 ]
机构
[1] Manchester Metropolitan Univ, Dept Comp Math & Digital Technol, Manchester, Lancs, England
关键词
Word Embedding; Language Modelling; Deep Learning; Social Network Analysis; Twitter Analysis;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Distributed word representations have recently contributed to significant improvements in many natural language processing (NLP) tasks. Distributional semantics have become amongst the important trends in machine learning (ML) applications. Word embeddings are distributed representations of words that learn semantic relationships from a large corpus of text. In the social context, the distributed representation of a word is likely to be different from general text word embeddings. This is relatively due to the unique lexical semantic features and morphological structure of social media text such as tweets, which implies different word vector representations. In this paper, we collect and present a political social dataset that consists of over four million English tweets. An artificial neural network (NN) is trained to learn word co -occurrence and generate word vectors from the political corpus of tweets. The model is 136MB and includes word representations for a vocabulary of over 86K unique words and phrases. The learned model shall contribute to the success of many ML and NLP applications in microblogging Social Network Analysis (OSN), such as semantic similarity and cluster analysis tasks.
引用
收藏
页码:177 / 183
页数:7
相关论文
共 50 条
  • [21] Bisociative Literature-Based Discovery: Lessons Learned and New Word Embedding Approach
    Lavrac, Nada
    Martinc, Matej
    Pollak, Senja
    Novak, Marusa Pompe
    Cestnik, Bojan
    NEW GENERATION COMPUTING, 2020, 38 (04) : 773 - 800
  • [22] Effective and scalable legal judgment recommendation using pre-learned word embedding
    Jenish Dhanani
    Rupa Mehta
    Dipti Rana
    Complex & Intelligent Systems, 2022, 8 : 3199 - 3213
  • [23] Court Similar Case Recommendation Model Based on Word Embedding and Word Frequency
    Yang, Fan
    Chen, Jianxia
    Huang, Yujun
    Li, Chao
    2020 12TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI), 2020, : 165 - 170
  • [24] Effective and scalable legal judgment recommendation using pre-learned word embedding
    Dhanani, Jenish
    Mehta, Rupa
    Rana, Dipti
    COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (04) : 3199 - 3213
  • [25] Short Text Clustering based on Word Semantic Graph with Word Embedding Model
    Jinarat, Supakpong
    Manaskasemsak, Bundit
    Rungsawang, Arnon
    2018 JOINT 10TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 19TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2018, : 1427 - 1432
  • [26] A Custom Word Embedding Model for Clustering of Maintenance Records
    Bhardwaj, Abhijeet Sandeep
    Deep, Akash
    Veeramani, Dharmaraj
    Zhou, Shiyu
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2022, 18 (02) : 816 - 826
  • [27] Classification of Obsessive-Compulsive Disorder Symptoms in Arabic Tweets Using Machine Learning and Word Embedding Techniques
    Al-Haider, Malak Fahad
    Qamar, Ali Mustafa
    Alkahtani, Hasan Shojaa
    Ahmad, Hafiz Farooq
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2024, 15 (07) : 798 - 811
  • [28] CONVERGENCE OF THE PARTITION FUNCTION IN THE STATIC WORD EMBEDDING MODEL
    Mynbaev, Kairat
    Assylbekov, Zhenisbek
    EURASIAN MATHEMATICAL JOURNAL, 2022, 13 (04): : 70 - 81
  • [29] An Improved Embedding Matching Model for Chinese Word Segmentation
    Deng, Xiaolong
    Sun, Yingfei
    2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD), 2018, : 195 - 200
  • [30] Emotional Similarity Word Embedding Model for Sentiment Analysis
    Matsumoto, Kazuyuki
    Matsunaga, Takumi
    Yoshida, Minoru
    Kita, Kenji
    COMPUTACION Y SISTEMAS, 2022, 26 (02): : 875 - 886