Automatic Topic Modeling for Single Document Short Texts

被引:1
|
作者
Sajid, Anamta [1 ]
Jan, Sadaqat [1 ]
Shah, Ibrar A. [1 ]
机构
[1] Univ Engn & Technol, Dept Comp Software Engn, Mardan Campus, Mardan, Pakistan
来源
2017 INTERNATIONAL CONFERENCE ON FRONTIERS OF INFORMATION TECHNOLOGY (FIT) | 2017年
关键词
Data mining; text mining; topic modeling;
D O I
10.1109/FIT.2017.00020
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper presents a novel approach to automate the process of extracting topic and main title from a single-document short text. The proposed approach uses online text mining and Natural Language Processing techniques. The title of any text provides an efficient way to concisely grasp the overview of the contents in the text by giving a glance on its main heading only, which is quicker than reading the summary. In this paper, three different mechanisms have been proposed, implemented and compared to find the best approach for automatic extraction of a topic that is more relevant to the overall event explained in the text. The proposed system is evaluated against fifteen news articles from New York Times. The significance of the paper is twofold: Firstly, these automatic topic extraction techniques can be used further for document classification, document relevancy and similarity, summarization, comprehensive grasp of any event and finding novelty in outsized and scattered text data by scanning titles. Secondly, it can be used as a roadmap for the new researchers by using this detailed analysis of various data mining techniques. The experimental results show that the Nouns are more related, reliable, and suitable words for finding the topic of the text.
引用
收藏
页码:70 / 75
页数:6
相关论文
共 50 条
  • [11] Targeted aspects oriented topic modeling for short texts
    He, Jin
    Li, Lei
    Wang, Yan
    Wu, Xindong
    APPLIED INTELLIGENCE, 2020, 50 (08) : 2384 - 2399
  • [12] Multiple Relational Topic Modeling for Noisy Short Texts
    Liu, Zheng
    Liu, Chiyu
    Xia, Bin
    Li, Tao
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2018, 28 (11-12) : 1559 - 1574
  • [13] Topic Modeling for Short Texts with Auxiliary Word Embeddings
    Li, Chenliang
    Wang, Haoran
    Zhang, Zhiqian
    Sun, Aixin
    Ma, Zongyang
    SIGIR'16: PROCEEDINGS OF THE 39TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2016, : 165 - 174
  • [14] Modeling Topic Evolution in Social Media Short Texts
    Zhang, Yuhao
    Mao, Wenji
    Lin, Junjie
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (IEEE ICBK 2017), 2017, : 315 - 319
  • [15] A Nested Chinese Restaurant Topic Model for Short Texts with Document Embeddings
    Niu, Yue
    Zhang, Hongjie
    Li, Jing
    APPLIED SCIENCES-BASEL, 2021, 11 (18):
  • [16] Authorship Attribution for Short Texts with Author-Document Topic Model
    Zhang, Haowen
    Nie, Peng
    Wen, Yanlong
    Yuan, Xiaojie
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT (KSEM 2018), PT I, 2018, 11061 : 29 - 41
  • [17] A topic modeling based approach to novel document automatic summarization
    Wu, Zongda
    Lei, Li
    Li, Guiling
    Huang, Hui
    Zheng, Chengren
    Chen, Enhong
    Xu, Guandong
    EXPERT SYSTEMS WITH APPLICATIONS, 2017, 84 : 12 - 23
  • [18] Topic Modeling over Short Texts by Incorporating Word Embeddings
    Qiang, Jipeng
    Chen, Ping
    Wang, Tong
    Wu, Xindong
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2017, PT II, 2017, 10235 : 363 - 374
  • [19] Enhancing Topic Modeling for Short Texts with Auxiliary Word Embeddings
    Li, Chenliang
    Duan, Yu
    Wang, Haoran
    Zhang, Zhiqian
    Sun, Aixin
    Ma, Zongyang
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2017, 36 (02)
  • [20] Incorporating Biterm Correlation Knowledge into Topic Modeling for Short Texts
    Zhang, Kai
    Zhou, Yuan
    Chen, Zheng
    Liu, Yufei
    Tang, Zhuo
    Yin, Li
    Chen, Jihong
    COMPUTER JOURNAL, 2022, 65 (03): : 537 - 553