A text semantic topic discovery method based on the conditional co-occurrence degree

被引:24
|
作者
Wei, Wei [1 ,2 ]
Guo, Chonghui [2 ]
机构
[1] Zhengzhou Univ, Ctr Energy Environm & Econ Res, Zhengzhou 450001, Henan, Peoples R China
[2] Dalian Univ Technol, Inst Syst Engn, Dalian 116024, Peoples R China
基金
中国国家自然科学基金;
关键词
Text mining; Topic discovery; Semantic information; Conditional co-occurrence degree; PARAGRAPH; MODEL;
D O I
10.1016/j.neucom.2019.08.047
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The topic discovery method, as an effective tool for semantic mining and a key means to extract new features from original text, plays an important role in the field of text mining and knowledge discovery. To solve the problems encountered in traditional topic models, such as the loss of semantic information and the ambiguity of topic concepts, as well as the crossover and coverage among topics, we propose a semantic topic discovery method based on the conditional co-occurrence degree (CCOD_STDM). First, every document is split into multiple subdocuments according to the semantic structure of the document and the independence decision rules. Second, combinatorial words with strong semantic relevance are extracted based on the conditional co-occurrence degree within the subdocuments. Based on these combinatorial words, new subdocuments are formed by feature expansion and content reconstruction. Third, "topic-word" distributions and "document-topic" distributions of new subdocuments are obtained by topic modeling with Gibbs sampling. Finally, "document-topic" distributions of the original documents are obtained by merging new subdocuments' "document-topic" distributions with specific strategies. The numerical experiments are compared with six topic models and two evaluation methods on seven kinds of public corpora, and the experimental results verify the superiority of CCOD_STDM and its efficiency in topic discovery. More importantly, a case study illustrates that the combinatorial words can effectively avoid the polysemy problem and can facilitate the condensation and summary of topics. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:11 / 24
页数:14
相关论文
共 50 条
  • [1] Text Topic Mining Based on LDA and Co-occurrence Theory
    Wu Maowen
    Zhang CaiDong
    Lan Weiyao
    Wu QingQiang
    PROCEEDINGS OF 2012 7TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION, VOLS I-VI, 2012, : 525 - 528
  • [2] CCODM: conditional co-occurrence degree matrix document representation method
    Wei Wei
    Chonghui Guo
    Jingfeng Chen
    Lin Tang
    Leilei Sun
    Soft Computing, 2019, 23 : 1239 - 1255
  • [3] CCODM: conditional co-occurrence degree matrix document representation method
    Wei, Wei
    Guo, Chonghui
    Chen, Jingfeng
    Tang, Lin
    Sun, Leilei
    SOFT COMPUTING, 2019, 23 (04) : 1239 - 1255
  • [4] Text Classification Method Based on Co-occurrence Events
    Huang, Chan
    Luo, Yanmei
    Li, Qingyuan
    2019 15TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS 2019), 2019, : 277 - 281
  • [5] Text Similarity Computing Based on LDA Topic Model and Word Co-occurrence
    Shao, Minglai
    Qin, Liangxi
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, KNOWLEDGE ENGINEERING AND INFORMATION ENGINEERING (SEKEIE 2014), 2014, 114 : 199 - 203
  • [6] Word co-occurrence augmented topic model in short text
    Chen, Guan-Bin
    Kao, Hung-Yu
    INTELLIGENT DATA ANALYSIS, 2017, 21 : S55 - S70
  • [7] Semantic Relation Discovery by Using Co-occurrence Information
    Schulz, Stefan
    Costa, Catalina Martinez
    Kreuzthaler, Markus
    Minarro-Gimenez, Jose Antonio
    Andersen, Ulrich
    Jensen, Anders Boeck
    Maegaard, Bente
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
  • [8] Text Clustering Algorithm Based on the Graph Structures of Semantic Word Co-occurrence
    Jin, Chun-Xia
    Bai, Qiu-Chan
    2016 INTERNATIONAL CONFERENCE ON INFORMATION SYSTEM AND ARTIFICIAL INTELLIGENCE (ISAI 2016), 2016, : 497 - 502
  • [9] Computing Text Semantic Similarity with Syntactic Network of Co-occurrence Distance
    Jiao Y.
    Jing M.
    Kang F.
    Data Analysis and Knowledge Discovery, 2019, 3 (12) : 93 - 100
  • [10] The Trajectory of Scientific Discovery: Concept Co-Occurrence and Converging Semantic Distance
    Cohen, Trevor
    Schvaneveldt, Roger W.
    MEDINFO 2010, PTS I AND II, 2010, 160 : 661 - 665