Useful ToPIC: Self-tuning strategies to enhance Latent Dirichlet Allocation

被引:6
|
作者
Proto, Stefano [1 ]
Di Corso, Evelina [1 ]
Ventura, Francesco [1 ]
Cerquitelli, Tania [1 ]
机构
[1] Politecn Torino, Dipartimento Automat & Informat, Turin, Italy
关键词
Text mining; Parameter-free technique; topic detection; LDA; data weighting function; Big Data framework;
D O I
10.1109/BigDataCongress.2018.00012
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
ToPIC (Tuning of Parameters for Inference of Concepts) is a distributed self-tuning engine whose aim is to cluster collections of textual data into correlated groups of documents through a topic modeling methodology (i.e., LDA). ToPIC includes automatic strategies to relieve the end-user of the burden of selecting proper values for the overall analytics process. ToPIC's current implementation runs on Apache Spark, a state-of-the-art distributed computing framework. As a case study, ToPIC has been validated on three real collections of textual documents characterized by different distributions. The experimental results show the effectiveness and efficiency of the proposed solution in analyzing collections of documents without tuning algorithm parameters and in discovering cohesive and well-separated groups of documents with a similar topic.
引用
收藏
页码:33 / 40
页数:8
相关论文
共 50 条
  • [31] Research Topic Analysis in Engineering Management Using a Latent Dirichlet Allocation Model
    Kim, Jin Ho
    Chen, Weiru
    JOURNAL OF INDUSTRIAL INTEGRATION AND MANAGEMENT-INNOVATION AND ENTREPRENEURSHIP, 2018, 3 (04):
  • [32] Local–class–shared–topic latent Dirichlet allocation based scene classification
    Chao Huang
    Wang Luo
    Yurui Xie
    Multimedia Tools and Applications, 2017, 76 : 15661 - 15679
  • [33] Topic Analysis of the Research Domain in Knowledge Organization: A Latent Dirichlet Allocation Approach
    Joo, Soohyung
    Choi, Inkyung
    Choi, Namjoo
    KNOWLEDGE ORGANIZATION, 2018, 45 (02): : 170 - 183
  • [34] Indonesian's Song Lyrics Topic Modelling using Latent Dirichlet Allocation
    Laoh, Enrico
    Surjandari, Isti
    Febirautami, Limisgy Ramadhina
    2018 5TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE 2018), 2018, : 270 - 274
  • [35] Topic Extraction and Sentiment Classification by using Latent Dirichlet Markov Allocation and SentiWordNet
    Kaur, Preet Chandan
    Ghorpade, Tushar
    Mane, Vanita
    INTERNATIONAL CONFERENCE ON ADVANCES IN INFORMATION COMMUNICATION TECHNOLOGY & COMPUTING, 2016, 2016,
  • [36] Sentiment Analysis Using Latent Dirichlet Allocation and Topic Polarity Wordcloud Visualization
    Bashri, Mohammad F. A.
    Kusumaningrum, Retno
    2017 5TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICOIC7), 2017,
  • [37] Topic-Based User Segmentation for Online Advertising with Latent Dirichlet Allocation
    Tu, Songgao
    Lu, Chaojun
    ADVANCED DATA MINING AND APPLICATIONS (ADMA 2010), PT II, 2010, 6441 : 259 - 269
  • [38] HDPauthor: A New Hybrid Author-Topic Model using Latent Dirichlet Allocation and Hierarchical Dirichlet Processes
    Yang, Ming
    Hsu, Willian H.
    PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'16 COMPANION), 2016, : 619 - 624
  • [39] Language Model Adaptation Using Latent Dirichlet Allocation and an Efficient Topic Inference Algorithm
    Heidel, Aaron
    Chang, Hung-an
    Lee, Lin-shan
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1145 - +
  • [40] Topic Model Tutorial A basic introduction on latent Dirichlet allocation and extensions for web scientists
    Kling, Christoph Carl
    Posch, Lisa
    Bleier, Arnim
    Dietz, Laura
    PROCEEDINGS OF THE 2016 ACM WEB SCIENCE CONFERENCE (WEBSCI'16), 2016, : 10 - 10