Useful ToPIC: Self-tuning strategies to enhance Latent Dirichlet Allocation

被引:6
|
作者
Proto, Stefano [1 ]
Di Corso, Evelina [1 ]
Ventura, Francesco [1 ]
Cerquitelli, Tania [1 ]
机构
[1] Politecn Torino, Dipartimento Automat & Informat, Turin, Italy
关键词
Text mining; Parameter-free technique; topic detection; LDA; data weighting function; Big Data framework;
D O I
10.1109/BigDataCongress.2018.00012
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
ToPIC (Tuning of Parameters for Inference of Concepts) is a distributed self-tuning engine whose aim is to cluster collections of textual data into correlated groups of documents through a topic modeling methodology (i.e., LDA). ToPIC includes automatic strategies to relieve the end-user of the burden of selecting proper values for the overall analytics process. ToPIC's current implementation runs on Apache Spark, a state-of-the-art distributed computing framework. As a case study, ToPIC has been validated on three real collections of textual documents characterized by different distributions. The experimental results show the effectiveness and efficiency of the proposed solution in analyzing collections of documents without tuning algorithm parameters and in discovering cohesive and well-separated groups of documents with a similar topic.
引用
收藏
页码:33 / 40
页数:8
相关论文
共 50 条
  • [1] Topic Selection in Latent Dirichlet Allocation
    Wang, Biao
    Liu, Zelong
    Li, Maozhen
    Liu, Yang
    Qi, Man
    2014 11TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2014, : 756 - 760
  • [2] Topic Modeling Using Latent Dirichlet allocation: A Survey
    Chauhan, Uttam
    Shah, Apurva
    ACM COMPUTING SURVEYS, 2021, 54 (07)
  • [3] A Hybrid Latent Dirichlet Allocation Approach for Topic Classification
    Hsu, Chi-I
    Chiu, Chaochang
    2017 IEEE INTERNATIONAL CONFERENCE ON INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS (INISTA), 2017, : 312 - 315
  • [4] Semantic latent dirichlet allocation for automatic topic extraction
    Bhutada, Sunil
    Balaram, V. V. S. S. S.
    Bulusu, Vishnu Vardhan
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2016, 37 (03): : 449 - 469
  • [5] Using Latent Dirichlet Allocation for Topic Modelling in Twitter
    Ostrowski, David Alfred
    2015 IEEE 9TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2015, : 493 - 497
  • [6] Topic Model Allocation of Conversational Dialogue Records by Latent Dirichlet Allocation
    Yeh, Jui-Feng
    Lee, Chen-Hsien
    Tan, Yi-Shiuan
    Yu, Liang-Chih
    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
  • [7] Topic modeling for expert finding using latent Dirichlet allocation
    Momtazi, Saeedeh
    Naumann, Felix
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2013, 3 (05) : 346 - 353
  • [8] Approaches to improve preprocessing for Latent Dirichlet Allocation topic modeling
    Zimmermann, Jamie
    Champagne, Lance E.
    Dickens, John M.
    Hazen, Benjamin T.
    DECISION SUPPORT SYSTEMS, 2024, 185
  • [9] Topic modeling with latent Dirichlet allocation for cancer disease posts
    Altintas, Volkan
    Albayrak, Mehmet
    Topal, Kamil
    JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY, 2021, 36 (04): : 2183 - 2196
  • [10] An Improved Latent Dirichlet Allocation Model for Hot Topic Extraction
    Liu, Guolong
    Xu, Xiaofei
    Zhu, Ying
    Li, Li
    2014 IEEE FOURTH INTERNATIONAL CONFERENCE ON BIG DATA AND CLOUD COMPUTING (BDCLOUD), 2014, : 470 - 476