Useful ToPIC: Self-tuning strategies to enhance Latent Dirichlet Allocation

被引:6
|
作者
Proto, Stefano [1 ]
Di Corso, Evelina [1 ]
Ventura, Francesco [1 ]
Cerquitelli, Tania [1 ]
机构
[1] Politecn Torino, Dipartimento Automat & Informat, Turin, Italy
关键词
Text mining; Parameter-free technique; topic detection; LDA; data weighting function; Big Data framework;
D O I
10.1109/BigDataCongress.2018.00012
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
ToPIC (Tuning of Parameters for Inference of Concepts) is a distributed self-tuning engine whose aim is to cluster collections of textual data into correlated groups of documents through a topic modeling methodology (i.e., LDA). ToPIC includes automatic strategies to relieve the end-user of the burden of selecting proper values for the overall analytics process. ToPIC's current implementation runs on Apache Spark, a state-of-the-art distributed computing framework. As a case study, ToPIC has been validated on three real collections of textual documents characterized by different distributions. The experimental results show the effectiveness and efficiency of the proposed solution in analyzing collections of documents without tuning algorithm parameters and in discovering cohesive and well-separated groups of documents with a similar topic.
引用
收藏
页码:33 / 40
页数:8
相关论文
共 50 条
  • [21] Topic and Trend Detection in Text Collections Using Latent Dirichlet Allocation
    Bolelli, Levent
    Ertekin, Seyda
    Giles, C. Lee
    ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, 2009, 5478 : 776 - +
  • [22] A FRAMEWORK OF URDU TOPIC MODELING USING LATENT DIRICHLET ALLOCATION (LDA)
    Shakeel, Khadija
    Tahir, Ghulam Rasool
    Tehseen, Irsha
    Ali, Mubashir
    2018 IEEE 8TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2018, : 117 - 123
  • [23] ldagibbs: A command for topic modeling in Stata using latent Dirichlet allocation
    Schwarz, Carlo
    STATA JOURNAL, 2018, 18 (01): : 101 - 117
  • [24] Latent Dirichlet allocation (LDA) for topic modeling of the CFPB consumer complaints
    Bastani, Kaveh
    Namavari, Hamed
    Shaffer, Jeffrey
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 127 : 256 - 271
  • [25] Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey
    Jelodar, Hamed
    Wang, Yongli
    Yuan, Chi
    Feng, Xia
    Jiang, Xiahui
    Li, Yanchao
    Zhao, Liang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (11) : 15169 - 15211
  • [26] Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey
    Hamed Jelodar
    Yongli Wang
    Chi Yuan
    Xia Feng
    Xiahui Jiang
    Yanchao Li
    Liang Zhao
    Multimedia Tools and Applications, 2019, 78 : 15169 - 15211
  • [27] Maximizing speedup through self-tuning of processor allocation
    Nguyen, TD
    Vaswani, R
    Zahorjan, J
    10TH INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM - PROCEEDINGS OF IPPS '96, 1996, : 463 - 468
  • [28] AUGMENTED LATENT DIRICHLET ALLOCATION (LDA) TOPIC MODEL WITH GAUSSIAN MIXTURE TOPICS
    Prabhudesai, Kedar S.
    Mainsah, Boyla O.
    Collins, Leslie M.
    Throckmorton, Chandra S.
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2451 - 2455
  • [29] Using Latent Dirichlet Allocation to Incorporate Domain Knowledge For Topic Transition Detection
    Zhu, Xiaodan
    He, Xuming
    Munteanu, Cosmin
    Penn, Gerald
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2442 - 2445
  • [30] Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
    Putthividhya, Duangmanee
    Attias, Hagai T.
    Nagarajan, Srikantan S.
    2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, : 3408 - 3415