A text semantic topic discovery method based on the conditional co-occurrence degree

被引:24
|
作者
Wei, Wei [1 ,2 ]
Guo, Chonghui [2 ]
机构
[1] Zhengzhou Univ, Ctr Energy Environm & Econ Res, Zhengzhou 450001, Henan, Peoples R China
[2] Dalian Univ Technol, Inst Syst Engn, Dalian 116024, Peoples R China
基金
中国国家自然科学基金;
关键词
Text mining; Topic discovery; Semantic information; Conditional co-occurrence degree; PARAGRAPH; MODEL;
D O I
10.1016/j.neucom.2019.08.047
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The topic discovery method, as an effective tool for semantic mining and a key means to extract new features from original text, plays an important role in the field of text mining and knowledge discovery. To solve the problems encountered in traditional topic models, such as the loss of semantic information and the ambiguity of topic concepts, as well as the crossover and coverage among topics, we propose a semantic topic discovery method based on the conditional co-occurrence degree (CCOD_STDM). First, every document is split into multiple subdocuments according to the semantic structure of the document and the independence decision rules. Second, combinatorial words with strong semantic relevance are extracted based on the conditional co-occurrence degree within the subdocuments. Based on these combinatorial words, new subdocuments are formed by feature expansion and content reconstruction. Third, "topic-word" distributions and "document-topic" distributions of new subdocuments are obtained by topic modeling with Gibbs sampling. Finally, "document-topic" distributions of the original documents are obtained by merging new subdocuments' "document-topic" distributions with specific strategies. The numerical experiments are compared with six topic models and two evaluation methods on seven kinds of public corpora, and the experimental results verify the superiority of CCOD_STDM and its efficiency in topic discovery. More importantly, a case study illustrates that the combinatorial words can effectively avoid the polysemy problem and can facilitate the condensation and summary of topics. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:11 / 24
页数:14
相关论文
共 50 条
  • [41] Measurement Analysis of Co-occurrence Degree of Web Objects
    Kamiyama, Noriaki
    Sakurai, Kouki
    Nakao, Akihiro
    IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (IEEE INFOCOM WKSHPS 2021), 2021,
  • [42] Clustering Analysis of Feature Words in News Text Based on Co-occurrence Matrix
    Liu, Shan
    Fan, Xinyi
    Chai, Jianping
    2017 10TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI), 2017,
  • [43] Integrating Text Classification into Topic Discovery Using Semantic Embedding Models
    Lezama-Sanchez, Ana Laura
    Vidal, Mireya Tovar
    Reyes-Ortiz, Jose A.
    APPLIED SCIENCES-BASEL, 2023, 13 (17):
  • [44] Constructing Pseudo Documents with Semantic Similarity for Short Text Topic Discovery
    Lu, Heng-yang
    Li, Yun
    Tang, Chi
    Wang, Chong-jun
    Xie, Jun-yuan
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT V, 2018, 11305 : 437 - 449
  • [45] Topic discovery based on text mining techniques
    Pons-Porrata, Aurora
    Berlanga-Llavori, Rafael
    Ruiz-Shulcloper, Jose
    INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (03) : 752 - 768
  • [46] An Inspection Method of Rice Milling Degree Based on Machine Vision and Gray-Gradient Co-occurrence Matrix
    Wan, Peng
    Long, Changjiang
    COMPUTER AND COMPUTING TECHNOLOGIES IN AGRICULTURE IV, PT 1, 2011, 344 : 195 - 202
  • [47] A Method of New Filter Design Based on the Co-occurrence Histogram
    Fujiwara, Takayuki
    Yamaashi, Kazuhiko
    Koshimizu, Hiroyasu
    ELECTRICAL ENGINEERING IN JAPAN, 2009, 166 (01) : 36 - 42
  • [48] A method for designing new filters based on co-occurrence histogram
    Fujiwara, Takayuki
    Yamaashi, Kazuhiko
    Koshimizu, Hiroyasu
    IEEJ Transactions on Electronics, Information and Systems, 2007, 127 (04) : 546 - 552
  • [49] An Image Tracking Method Based on Color Co-occurrence Histograms
    Wang Wei
    Wang Chunping
    Fu Qiang
    MEASUREMENT TECHNOLOGY AND ITS APPLICATION, PTS 1 AND 2, 2013, 239-240 : 936 - 941
  • [50] A word co-occurrence matrix based method for relevance feedback
    Chen, Zilong
    Lu, Yang
    Journal of Computational Information Systems, 2011, 7 (01): : 17 - 24