A text semantic topic discovery method based on the conditional co-occurrence degree

被引:24
|
作者
Wei, Wei [1 ,2 ]
Guo, Chonghui [2 ]
机构
[1] Zhengzhou Univ, Ctr Energy Environm & Econ Res, Zhengzhou 450001, Henan, Peoples R China
[2] Dalian Univ Technol, Inst Syst Engn, Dalian 116024, Peoples R China
基金
中国国家自然科学基金;
关键词
Text mining; Topic discovery; Semantic information; Conditional co-occurrence degree; PARAGRAPH; MODEL;
D O I
10.1016/j.neucom.2019.08.047
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The topic discovery method, as an effective tool for semantic mining and a key means to extract new features from original text, plays an important role in the field of text mining and knowledge discovery. To solve the problems encountered in traditional topic models, such as the loss of semantic information and the ambiguity of topic concepts, as well as the crossover and coverage among topics, we propose a semantic topic discovery method based on the conditional co-occurrence degree (CCOD_STDM). First, every document is split into multiple subdocuments according to the semantic structure of the document and the independence decision rules. Second, combinatorial words with strong semantic relevance are extracted based on the conditional co-occurrence degree within the subdocuments. Based on these combinatorial words, new subdocuments are formed by feature expansion and content reconstruction. Third, "topic-word" distributions and "document-topic" distributions of new subdocuments are obtained by topic modeling with Gibbs sampling. Finally, "document-topic" distributions of the original documents are obtained by merging new subdocuments' "document-topic" distributions with specific strategies. The numerical experiments are compared with six topic models and two evaluation methods on seven kinds of public corpora, and the experimental results verify the superiority of CCOD_STDM and its efficiency in topic discovery. More importantly, a case study illustrates that the combinatorial words can effectively avoid the polysemy problem and can facilitate the condensation and summary of topics. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:11 / 24
页数:14
相关论文
共 50 条
  • [31] Analyzing Asymmetric Relationship Between Documents Based on Topic Word Co-occurrence
    Zhang G.
    Wang X.
    Xu J.
    Data Analysis and Knowledge Discovery, 2023, 7 (03) : 110 - 120
  • [32] A novel automatic text summarization study based on term co-occurrence
    Geng, Huantong
    Zhao, Peng
    Chen, Enhong
    Cai, Qingsheng
    PROCEEDINGS OF THE FIFTH IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS, VOLS 1 AND 2, 2006, : 601 - 606
  • [33] CONDITIONAL AND UNCONDITIONAL CO-OCCURRENCE RESTRICTIONS IN RECALL OF BIGRAMS
    SMITH, KH
    PSYCHONOMIC SCIENCE, 1968, 12 (08): : 379 - &
  • [34] Frequent pattern discovery based on co-occurrence frequent item tree
    Hemalatha, R
    Krishnan, A
    Senthamarai, C
    Hemamalini, R
    2005 INTERNATIONAL CONFERENCE ON INTELLIGENT SENSING AND INFORMATION PROCESSING, PROCEEDINGS, 2005, : 348 - 354
  • [35] Discovery of online game user relationship based on co-occurrence of words
    Thawonmas, Ruck
    Konno, Yuki
    Tsuda, Kohei
    ENTERTAINMENT COMPUTING - ICEC 2006, 2006, 4161 : 286 - +
  • [36] Semantic Text Alignment based on Topic Modeling
    Le, Huong T.
    Pham, Lam N.
    Nguyen, Duy D.
    Nguyen, Son V.
    Nguyen, An N.
    2016 IEEE RIVF INTERNATIONAL CONFERENCE ON COMPUTING & COMMUNICATION TECHNOLOGIES, RESEARCH, INNOVATION, AND VISION FOR THE FUTURE (RIVF), 2016, : 67 - 72
  • [37] The Role of Co-Occurrence Statistics in Developing Semantic Knowledge
    Unger, Layla
    Vales, Catarina
    Fisher, Anna V.
    COGNITIVE SCIENCE, 2020, 44 (09)
  • [38] Semantic Segmentation Considering Location and Co-occurrence in Scene
    Shimazaki, Ken
    Nagao, Tomoharu
    2015 IEEE 8TH INTERNATIONAL WORKSHOP ON COMPUTATIONAL INTELLIGENCE AND APPLICATIONS (IWCIA) PROCEEDINGS, 2015, : 41 - 46
  • [39] Automatic Keywords Extraction Based on Co-Occurrence and Semantic Relationships Between Words
    Mao, Xiangke
    Huang, Shaobin
    Li, Rongsheng
    Shen, Linshan
    IEEE ACCESS, 2020, 8 : 117528 - 117538
  • [40] Combining word based and word co-occurrence based sequence analysis for text categorization
    Luo, X
    Zincir-Heywood, AN
    PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 1580 - 1585