Automatic Topic Modeling for Single Document Short Texts

被引:1
|
作者
Sajid, Anamta [1 ]
Jan, Sadaqat [1 ]
Shah, Ibrar A. [1 ]
机构
[1] Univ Engn & Technol, Dept Comp Software Engn, Mardan Campus, Mardan, Pakistan
来源
2017 INTERNATIONAL CONFERENCE ON FRONTIERS OF INFORMATION TECHNOLOGY (FIT) | 2017年
关键词
Data mining; text mining; topic modeling;
D O I
10.1109/FIT.2017.00020
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper presents a novel approach to automate the process of extracting topic and main title from a single-document short text. The proposed approach uses online text mining and Natural Language Processing techniques. The title of any text provides an efficient way to concisely grasp the overview of the contents in the text by giving a glance on its main heading only, which is quicker than reading the summary. In this paper, three different mechanisms have been proposed, implemented and compared to find the best approach for automatic extraction of a topic that is more relevant to the overall event explained in the text. The proposed system is evaluated against fifteen news articles from New York Times. The significance of the paper is twofold: Firstly, these automatic topic extraction techniques can be used further for document classification, document relevancy and similarity, summarization, comprehensive grasp of any event and finding novelty in outsized and scattered text data by scanning titles. Secondly, it can be used as a roadmap for the new researchers by using this detailed analysis of various data mining techniques. The experimental results show that the Nouns are more related, reliable, and suitable words for finding the topic of the text.
引用
收藏
页码:70 / 75
页数:6
相关论文
共 50 条
  • [21] Context reinforced neural topic modeling over short texts
    Feng, Jiachun
    Zhang, Zusheng
    Ding, Cheng
    Rao, Yanghui
    Xie, Haoran
    Wang, Fu Lee
    INFORMATION SCIENCES, 2022, 607 : 79 - 91
  • [22] Topic segmentation for short texts
    Chang, TH
    Lee, CH
    PACLIC 17: LANGUAGE, INFORMATION AND COMPUTATION, PROCEEDINGS, 2003, : 159 - 165
  • [23] PSLDA: a novel supervised pseudo document-based topic model for short texts
    Mingtao Sun
    Xiaowei Zhao
    Jingjing Lin
    Jian Jing
    Deqing Wang
    Guozhu Jia
    Frontiers of Computer Science, 2022, 16
  • [24] PSLDA:a novel supervised pseudo document-based topic model for short texts
    Mingtao SUN
    Xiaowei ZHAO
    Jingjing LIN
    Jian JING
    Deqing WANG
    Guozhu JIA
    Frontiers of Computer Science, 2022, 16 (06) : 72 - 81
  • [25] PSLDA: a novel supervised pseudo document-based topic model for short texts
    Sun, Mingtao
    Zhao, Xiaowei
    Lin, Jingjing
    Jing, Jian
    Wang, Deqing
    Jia, Guozhu
    FRONTIERS OF COMPUTER SCIENCE, 2022, 16 (06)
  • [26] Single-Document Automatic Abstracting System Based on Topic Partition
    Zhang, Yuanhong
    Guo, Jianyi
    Gong, Huaming
    Xue, Zhengshan
    Zhang, Yanmei
    2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION WORKSHOP: IITA 2008 WORKSHOPS, PROCEEDINGS, 2008, : 280 - 283
  • [27] TSSE-DMM: Topic Modeling for Short Texts Based on Topic Subdivision and Semantic Enhancement
    Mai, Chengcheng
    Qiu, Xueming
    Luo, Kaiwen
    Chen, Min
    Zhao, Bo
    Huang, Yihua
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2021, PT II, 2021, 12713 : 640 - 651
  • [28] Topic Modeling on Second Texts
    Maryl, Maciej
    Eder, Maciej
    TEKSTY DRUGIE, 2023, (01):
  • [29] A New Automatic Multi-document Text Summarization using Topic Modeling
    Roul, Rajendra Kumar
    Mehrotra, Samarth
    Pungaliya, Yash
    Sahoo, Jajati Keshari
    DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY, ICDCIT 2019, 2019, 11319 : 212 - 221
  • [30] Tracking Topic Trends for Short Texts
    He, Liyan
    Du, Yajun
    Ye, Yongtao
    KNOWLEDGE GRAPH AND SEMANTIC COMPUTING: LANGUAGE, KNOWLEDGE, AND INTELLIGENCE, CCKS 2017, 2017, 784 : 117 - 128