Topic Discovery via Convex Polytopic Model: A Case Study with Small Corpora

被引:0
|
作者
Wu, King Keung [1 ]
Meng, Helen [1 ]
Yam, Yeung [2 ]
机构
[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Peoples R China
[2] Chinese Univ Hong Kong, Dept Mech & Automat Engn, Hong Kong, Peoples R China
关键词
Topic discovery; document categorization; text representation; convex polytope; ALGORITHM;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Topic discovery is an important problem in text processing. Topic modeling approaches such as latent Dirichlet allocation (LDA) has been applied quite successfully in extracting topics. However, there still exists several directions for further improvement. Short texts (e.g. tweets and news titles) present the problem of data sparsity for LDA. Second, there needs to be greater transparency in the process of topic discovery in order to enhance interpretability for humans. Third, the robustness of the model needs to be further enhanced to avoid sensitivity to the choice of hyper-parameters. In this paper, we propose a novel geometric approach based on convex polytopic model (CPM) which can discover representative and interpretable topical features from the given corpus. By embedding all documents into a low-dimensional affine subspace, we show that the topics can be obtained geometrically as the vertices of a compact polytope which encloses all the embedded documents. We further interpret the features acquired as topics and use them to obtain a convex polytopic document representation for every document. We studied the properties of CPM by two small corpora of short texts. Results reveal that the proposed CPM can discover interpretable topics even for short texts. We also discover that the geometric nature of CPM enhances model transparency and topic interpretability, as well as robustness to hyper-parameter selection.
引用
收藏
页码:367 / 372
页数:6
相关论文
共 50 条
  • [1] Word Embedding In Small Corpora: A Case Study in Quran
    Aghahadi, Zeinab
    Talebpour, Alireza
    2018 8TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2018, : 303 - 307
  • [2] Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations
    Meng, Yu
    Zhang, Yunyi
    Huang, Jiaxin
    Zhang, Yu
    Han, Jiawei
    PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), 2022, : 3143 - 3152
  • [3] Study on Discovery Teaching Model of PIE and Case Design
    Liu Wei
    Luo Hui
    PROCEEDINGS OF 2010 INTERNATIONAL SYMPOSIUM - SPORTS SCIENCE AND ENGINEERING, 2010, : 150 - 157
  • [4] Quality indices for topic model selection and evaluation: a literature review and case study
    Meaney, Christopher
    Stukel, Therese A.
    Austin, Peter C.
    Moineddin, Rahim
    Greiver, Michelle
    Escobar, Michael
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2023, 23 (01)
  • [5] Topic and Sentiment Model Applied To the Colloquial Arabic: A Case Study of Maghrebi Arabic
    Zarra, Taoufiq
    Chiheb, Raddouane
    Moumen, Rajae
    Faizi, Rdouan
    El Afia, Abdellatif
    2017 INTERNATIONAL CONFERENCE ON SMART DIGITAL ENVIRONMENT (ICSDE'17), 2017, : 174 - 181
  • [6] A topic model analysis of science and technology linkages: A case study in pharmaceutical industry
    Ranaei, Samira
    Suominen, Arho
    Dedehayir, Ozgur
    2017 IEEE TECHNOLOGY & ENGINEERING MANAGEMENT SOCIETY CONFERENCE (TEMSCON), 2017, : 49 - 54
  • [7] Quality indices for topic model selection and evaluation: a literature review and case study
    Christopher Meaney
    Therese A. Stukel
    Peter C. Austin
    Rahim Moineddin
    Michelle Greiver
    Michael Escobar
    BMC Medical Informatics and Decision Making, 23
  • [8] Topic analysis in news via sparse learning: a case study on the 2016 US presidential elections
    Calafiore, Giuseppe C.
    El Ghaoui, Laurent
    Preziosi, Alessandro
    Russo, Luigi
    IFAC PAPERSONLINE, 2017, 50 (01): : 13593 - 13598
  • [9] Model-Based Discovery and Development of Biopharmaceuticals: A Case Study of Mavrilimumab
    Wang, Bing
    Wu, Chi-Yuan
    Jin, Denise
    Vicini, Paolo
    Roskos, Lorin
    CPT-PHARMACOMETRICS & SYSTEMS PHARMACOLOGY, 2018, 7 (01): : 5 - 15
  • [10] Convex Lifting Based Inverse Parametric Optimization for Implicit Model Predictive Control: A Case Study
    Gulan, Martin
    Nguyen, Ngoc Anh
    Takacs, Gergely
    2020 59TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2020, : 2501 - 2508