Topic Discovery via Convex Polytopic Model: A Case Study with Small Corpora

被引:0
|
作者
Wu, King Keung [1 ]
Meng, Helen [1 ]
Yam, Yeung [2 ]
机构
[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Peoples R China
[2] Chinese Univ Hong Kong, Dept Mech & Automat Engn, Hong Kong, Peoples R China
关键词
Topic discovery; document categorization; text representation; convex polytope; ALGORITHM;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Topic discovery is an important problem in text processing. Topic modeling approaches such as latent Dirichlet allocation (LDA) has been applied quite successfully in extracting topics. However, there still exists several directions for further improvement. Short texts (e.g. tweets and news titles) present the problem of data sparsity for LDA. Second, there needs to be greater transparency in the process of topic discovery in order to enhance interpretability for humans. Third, the robustness of the model needs to be further enhanced to avoid sensitivity to the choice of hyper-parameters. In this paper, we propose a novel geometric approach based on convex polytopic model (CPM) which can discover representative and interpretable topical features from the given corpus. By embedding all documents into a low-dimensional affine subspace, we show that the topics can be obtained geometrically as the vertices of a compact polytope which encloses all the embedded documents. We further interpret the features acquired as topics and use them to obtain a convex polytopic document representation for every document. We studied the properties of CPM by two small corpora of short texts. Results reveal that the proposed CPM can discover interpretable topics even for short texts. We also discover that the geometric nature of CPM enhances model transparency and topic interpretability, as well as robustness to hyper-parameter selection.
引用
收藏
页码:367 / 372
页数:6
相关论文
共 50 条
  • [21] Construction of a performance evaluation model: case study in a small business company of civil construction
    de Archer de Arruda Borges, Ana Paula
    Coelho, Gabriel Nilson
    Petri, Sergio Murilo
    REVISTA DE GESTAO E SECRETARIADO-GESEC, 2018, 9 (03): : 21 - 45
  • [22] USING A METHOD OF SMALL PARAMETER FOR STUDY OF MODEL OF THE DIFFUSION METASOMATISM ON THE CASE OF REVERSAL REACTIONS
    LEBEDEVA, MI
    ZARAYSKY, GP
    BALASHOV, VN
    GEOKHIMIYA, 1987, (03): : 459 - 464
  • [23] Electronic business model for small and medium sized manufacturing enterprises (SME): a case study
    Yuen, K
    Chung, W
    INTERNET-BASED ENTERPRISE INTEGRATION AND MANAGEMENT, 2001, 4566 : 159 - 164
  • [24] Epidemiologic information discovery from open-access COVID-19 case reports via pretrained language model
    Wang, Zhizheng
    Liu, Xiao Fan
    Du, Zhanwei
    Wang, Lin
    Wu, Ye
    Holme, Petter
    Lachmann, Michael
    Lin, Hongfei
    Wong, Zoie S. Y.
    Xu, Xiao-Ke
    Sun, Yuanyuan
    ISCIENCE, 2022, 25 (10)
  • [25] Using copulas to model repeat purchase behaviour - An exploratory analysis via a case study
    Meade, Nigel
    Islam, Towhidul
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2010, 200 (03) : 908 - 917
  • [26] Photovoltaic generation network design via a mixed integer programming model: A case study
    Asiedu, Y
    Chen, MY
    INFOR, 1997, 35 (03) : 225 - 238
  • [27] Optimal Spare Management via Statistical Model Checking: A Case Study in Research Reactors
    Soltani, Reza
    Volk, Matthias
    Diamonte, Leonardo
    Lopuhaa-Zwakenberg, Milan
    Stoelinga, Marielle
    FORMAL METHODS FOR INDUSTRIAL CRITICAL SYSTEMS, FMICS 2023, 2023, 14290 : 205 - 223
  • [28] Multi-level Association Rule Mining for the Discovery of Strong Underrepresented Patterns The Case Study of Small Dairy Farms in Tanzania
    Malamsha, Glory C.
    Nyambo, Devotha G.
    ENGINEERING TECHNOLOGY & APPLIED SCIENCE RESEARCH, 2023, 13 (02) : 10377 - 10383
  • [29] A model of scientific attitudes assessment by observation in physics learning based scientific approach: case study of dynamic fluid topic in high school
    Ekawati, Elvin Yusliana
    INTERNATIONAL CONFERENCE ON SCIENCE AND APPLIED SCIENCE (ENGINEERING AND EDUCATIONAL SCIENCE) 2016, 2017, 795
  • [30] Using the HTRC Data Capsule Model to Promote Reuse and Evolution of Experimental Analysis of Digital Library Data: A Case Study of Topic Modeling
    Bainbridge, David
    Nichols, David M.
    Hinze, Annika
    Downie, J. Stephen
    2019 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL 2019), 2019, : 463 - 464