A Weighted Topical Document Embedding based Clustering Method for News Text

被引:0
|
作者
Zhu Dechao [1 ]
Song Hui [1 ]
机构
[1] Donghua Univ, Sch Comp Sci, Shanghai, Peoples R China
来源
2016 IEEE INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC) | 2016年
关键词
Text Clustering; Skip-Gram; LDA; TF-IDF;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As an unsupervised machine learning method, clustering can preliminarily group text without artificial labeling, which effectively accelerates the organization, abstraction and navigation on large news set. The length of news is long, and the text contains many homonymy and polysemy, that is one of the reason that traditional text clustering methods perform weaker on grouping news text. This paper presents a novel text representation method based on topical document embedding (TDE) to capture the semantic features of different topics. In TDE representation, document embedding of news texts is obtained by adding up word vector from Skip-Gram model weighted by TF-IDF score of all the key words in the text. While the topical document embedding is learned by joining the topic vectors obtained from LDA model and the document vectors in document embedding. By using topical document embedding to perform clustering, we implement a novel text clustering method (TDE-TC). The experimental results show that the effect of news clustering based on TDE representation is better than that of bag of words model and LDA model.
引用
收藏
页码:1060 / 1065
页数:6
相关论文
共 50 条
  • [21] Text Document Preprocessing and Dimension Reduction Techniques for Text Document Clustering
    Kadhim, Ammar Ismael
    Cheah, Yu-N
    Ahamed, Nurul Hashimah
    PROCEEDINGS 2014 4TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE WITH APPLICATIONS IN ENGINEERING AND TECHNOLOGY ICAIET 2014, 2014, : 69 - 73
  • [22] NEWS STORY CLUSTERING WITH FISHER EMBEDDING
    Chu, Wei-Ta
    Hsu, Han-Nung
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 1175 - 1178
  • [23] Improved Meta-Heuristic Model for Text Document Clustering by Adaptive Weighted Similarity
    Venkanna, Gugulothu
    Bharati, K. F.
    INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2023, 31 (05) : 749 - 771
  • [24] Text document clustering and the space of concept on text document automatically generated
    Fu, WP
    Wu, B
    He, Q
    Shi, ZZ
    2001 INTERNATIONAL CONFERENCES ON INFO-TECH AND INFO-NET PROCEEDINGS, CONFERENCE A-G: INFO-TECH & INFO-NET: A KEY TO BETTER LIFE, 2001, : C107 - C112
  • [25] Text Document Clustering Based on Neural K-Mean Clustering Technique
    Kaur, Daljeet
    Bajwa, Jagpuneet Kaur
    ADVANCES IN COMPUTING AND DATA SCIENCES, ICACDS 2016, 2017, 721 : 336 - 344
  • [26] Improved graph node embedding and clustering method for fault short text
    Qiu J.
    Sun L.
    Han M.
    Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2023, 29 (12): : 4257 - 4266
  • [27] An Automatic Text Classification Method Based on Hierarchical Taxonomies, Neural Networks and Document Embedding: The NETHIC Tool
    Lomasto, Luigi
    Di Florio, Rosario
    Ciapetti, Andrea
    Miscione, Giuseppe
    Ruggiero, Giulia
    Toti, Daniele
    ENTERPRISE INFORMATION SYSTEMS (ICEIS 2019), 2020, 378 : 57 - 77
  • [28] Tens-embedding: A Tensor-based document embedding method
    Rahimi, Zahra
    Homayounpour, Mohammad Mehdi
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 162
  • [29] Text Document Clustering Based on Density K-means
    Wu, Di
    Zeng, Yan
    Qu, Yin-chuan
    INTERNATIONAL CONFERENCE ON COMPUTER, MECHATRONICS AND ELECTRONIC ENGINEERING (CMEE 2016), 2016,
  • [30] Analysis of similarity measures with WordNet based text document clustering
    Sandhya, Nadella
    Govardhan, A.
    Advances in Intelligent and Soft Computing, 2012, 132 AISC : 703 - 714