A text semantic topic discovery method based on the conditional co-occurrence degree

被引:24
|
作者
Wei, Wei [1 ,2 ]
Guo, Chonghui [2 ]
机构
[1] Zhengzhou Univ, Ctr Energy Environm & Econ Res, Zhengzhou 450001, Henan, Peoples R China
[2] Dalian Univ Technol, Inst Syst Engn, Dalian 116024, Peoples R China
基金
中国国家自然科学基金;
关键词
Text mining; Topic discovery; Semantic information; Conditional co-occurrence degree; PARAGRAPH; MODEL;
D O I
10.1016/j.neucom.2019.08.047
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The topic discovery method, as an effective tool for semantic mining and a key means to extract new features from original text, plays an important role in the field of text mining and knowledge discovery. To solve the problems encountered in traditional topic models, such as the loss of semantic information and the ambiguity of topic concepts, as well as the crossover and coverage among topics, we propose a semantic topic discovery method based on the conditional co-occurrence degree (CCOD_STDM). First, every document is split into multiple subdocuments according to the semantic structure of the document and the independence decision rules. Second, combinatorial words with strong semantic relevance are extracted based on the conditional co-occurrence degree within the subdocuments. Based on these combinatorial words, new subdocuments are formed by feature expansion and content reconstruction. Third, "topic-word" distributions and "document-topic" distributions of new subdocuments are obtained by topic modeling with Gibbs sampling. Finally, "document-topic" distributions of the original documents are obtained by merging new subdocuments' "document-topic" distributions with specific strategies. The numerical experiments are compared with six topic models and two evaluation methods on seven kinds of public corpora, and the experimental results verify the superiority of CCOD_STDM and its efficiency in topic discovery. More importantly, a case study illustrates that the combinatorial words can effectively avoid the polysemy problem and can facilitate the condensation and summary of topics. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:11 / 24
页数:14
相关论文
共 50 条
  • [21] Grounding co-occurrence: Identifying features in a lexical co-occurrence model of semantic memory
    Durda, Kevin
    Buchanan, Lori
    Caron, Richard
    BEHAVIOR RESEARCH METHODS, 2009, 41 (04) : 1210 - 1223
  • [22] A DSM-Based Co-Occurrence Matrix for Semantic Classification
    Xia, Wang
    Yan, Li
    Xie, Hong
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [23] Heterogeneous Latent Topic Discovery for Semantic Text Mining
    Li, Yawen
    Jiang, Di
    Lian, Rongzhong
    Wu, Xueyang
    Tan, Conghui
    Xu, Yi
    Su, Zhiyang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (01) : 533 - 544
  • [24] Topic Network Analysis Based on Co-Occurrence Time Series Clustering
    Lin, Weibin
    Wu, Xianli
    Wang, Zhengwei
    Wan, Xiaoji
    Li, Hailin
    MATHEMATICS, 2022, 10 (16)
  • [25] A Clustering Analysis of News Text Based on Co-occurrence Matrix
    Liu, Shan
    Fan, Xinyi
    Chai, Jianping
    PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 2281 - 2285
  • [26] Machine discovery based on the co-occurrence of references in a search engine
    Murata, T
    DISCOVERY SCIENCE, PROCEEDINGS, 1999, 1721 : 220 - 229
  • [27] Word co-occurrence features for text classification
    Figueiredo, Fabio
    Rocha, Leonardo
    Couto, Thierson
    Salles, Thiago
    Goncalves, Marcos Andre
    Meira, Wagner, Jr.
    INFORMATION SYSTEMS, 2011, 36 (05) : 843 - 858
  • [28] Extracting Topics with SimultaneousWord Co-occurrence and Semantic Correlation Graphs: Neural Topic Modeling for Short Texts
    Wang, Yiming
    Li, Ximing
    Zhou, Xiaotang
    Ouyang, Jihong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 18 - 27
  • [29] Topic Modeling for Short Texts with Co-occurrence Frequency-based Expansion
    Pedrosa, Gabriel
    Pita, Marcelo
    Bicalho, Paulo
    Lacerda, Anisio
    Pappa, Gisele L.
    PROCEEDINGS OF 2016 5TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2016), 2016, : 277 - 282
  • [30] Semantic relatedness measurement based on Wikipedia link co-occurrence analysis
    Ito, Masahiro
    Nakayama, Kotaro
    Hara, Takahiro
    Nishio, Shojiro
    INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2011, 7 (01) : 44 - +