Extracting Topics with SimultaneousWord Co-occurrence and Semantic Correlation Graphs: Neural Topic Modeling for Short Texts

被引:0
|
作者
Wang, Yiming [1 ,2 ]
Li, Ximing [1 ,2 ]
Zhou, Xiaotang [3 ]
Ouyang, Jihong [1 ,2 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Jilin, Jilin, Peoples R China
[2] Jilin Univ, Minist Educ, Key Lab Symbol Computat & Knowledge Engn, Jilin, Jilin, Peoples R China
[3] Changchun Univ Technol, Sch Comp Sci & Engn, Changchun, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Short text nowadays has become a more fashionable form of text data, e.g., Twitter posts, news titles, and product reviews. Extracting semantic topics from short texts plays a significant role in a wide spectrum of NLP applications, and neural topic modeling is now a major tool to achieve it. Motivated by learning more coherent and semantic topics, in this paper we develop a novel neural topic model named Dual Word Graph Topic Model (DWGTM), which extracts topics from simultaneous word co-occurrence and semantic correlation graphs. To be specific, we learn word features from the global word cooccurrence graph, so as to ingest rich word co-occurrence information; we then generate text features with word features, and feed them into an encoder network to get topic proportions per-text; finally, we reconstruct texts and word co-occurrence graph with topical distributions and word features, respectively. Besides, to capture semantics of words, we also apply word features to reconstruct a word semantic correlation graph computed by pretrained word embeddings. Upon those ideas, we formulate DWGTM in an auto-encoding paradigm and efficiently train it with the spirit of neural variational inference. Empirical results validate that DWGTM can generate more semantically coherent topics than baseline topic models.
引用
收藏
页码:18 / 27
页数:10
相关论文
共 29 条
  • [1] Topic Modeling for Short Texts with Co-occurrence Frequency-based Expansion
    Pedrosa, Gabriel
    Pita, Marcelo
    Bicalho, Paulo
    Lacerda, Anisio
    Pappa, Gisele L.
    PROCEEDINGS OF 2016 5TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2016), 2016, : 277 - 282
  • [2] Things and Strings: Improving Place Name Disambiguation from Short Texts by Combining Entity Co-Occurrence with Topic Modeling
    Ju, Yiting
    Adams, Benjamin
    Janowicz, Krzysztof
    Hu, Yingjie
    Yan, Bo
    McKenzie, Grant
    KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT, EKAW 2016, 2016, 10024 : 353 - 367
  • [3] Word co-occurrence augmented topic model in short text
    Chen, Guan-Bin
    Kao, Hung-Yu
    INTELLIGENT DATA ANALYSIS, 2017, 21 : S55 - S70
  • [4] Incorporating Biterm Correlation Knowledge into Topic Modeling for Short Texts
    Zhang, Kai
    Zhou, Yuan
    Chen, Zheng
    Liu, Yufei
    Tang, Zhuo
    Yin, Li
    Chen, Jihong
    COMPUTER JOURNAL, 2022, 65 (03): : 537 - 553
  • [5] Context reinforced neural topic modeling over short texts
    Feng, Jiachun
    Zhang, Zusheng
    Ding, Cheng
    Rao, Yanghui
    Xie, Haoran
    Wang, Fu Lee
    INFORMATION SCIENCES, 2022, 607 : 79 - 91
  • [6] Extracting semantic representations from word co-occurrence statistics: A computational study
    John A. Bullinaria
    Joseph P. Levy
    Behavior Research Methods, 2007, 39 : 510 - 526
  • [7] The principals of meaning: Extracting semantic dimensions from co-occurrence models of semantics
    Hollis, Geoff
    Westbury, Chris
    PSYCHONOMIC BULLETIN & REVIEW, 2016, 23 (06) : 1744 - 1756
  • [8] Extracting semantic representations from word co-occurrence statistics: A computational study
    Bullinaria, John A.
    Levy, Joseph P.
    BEHAVIOR RESEARCH METHODS, 2007, 39 (03) : 510 - 526
  • [9] The principals of meaning: Extracting semantic dimensions from co-occurrence models of semantics
    Geoff Hollis
    Chris Westbury
    Psychonomic Bulletin & Review, 2016, 23 : 1744 - 1756
  • [10] A text semantic topic discovery method based on the conditional co-occurrence degree
    Wei, Wei
    Guo, Chonghui
    NEUROCOMPUTING, 2019, 368 : 11 - 24