Fast Text Classification Using Randomized Explicit Semantic Analysis

被引:9
|
作者
Musaev, Aibek [1 ]
Wang, De [1 ]
Shridhar, Saajan [1 ]
Pu, Calton [1 ]
机构
[1] Georgia Inst Technol, Atlanta, GA 30332 USA
来源
2015 IEEE 16TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION | 2015年
基金
美国国家科学基金会;
关键词
text classification; explicit semantic analysis; social media; event detection;
D O I
10.1109/IRI.2015.62
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document classification or document categorization is one of the most studied areas in computer science due to its importance. The problem is to assign a document using its text to one or more classes or categories from a predefined set. We propose a new approach for fast text classification using randomized explicit semantic analysis (RS-ESA). It is based on a state of the art approach for word sense disambiguation based on Wikipedia, the largest encyclopedia in existence. Our method reduces Wikipedia repository using a random sample approach resulting in a throughput, which is an order of magnitude faster than the original explicit semantic analysis. RS-ESA approach has been implemented as part of the LITMUS project due to a need in classifying data from Social Media into relevant and irrelevant items with respect to landslide as a natural disaster. We demonstrate that our approach achieves 96% precision when classifying Social Media landslide data collected in December 2014. We also demonstrate the genericity of the proposed approach by using it for separating factual texts from fictional based on Wikipedia articles and fan fiction stories, where we achieve 97% in precision.
引用
收藏
页码:364 / 371
页数:8
相关论文
共 50 条
  • [21] SRL-ESA-TextSum: A text summarization approach based on semantic role labeling and explicit semantic analysis
    Mohamed, Muhidin
    Oussalah, Mourad
    INFORMATION PROCESSING & MANAGEMENT, 2019, 56 (04) : 1356 - 1372
  • [22] Boosting for text classification with semantic features
    Bloehdorn, Stephan
    Hotho, Andreas
    ADVANCES IN WEB MINING AND WEB USAGE ANALYSIS, 2006, 3932 : 149 - 166
  • [23] FAST LATENT SEMANTIC INDEX USING RANDOM MAPPING IN TEXT PROCESSING
    Qian, Xiao-Dong
    PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON WAVELET ANALYSIS AND PATTERN RECOGNITION, VOLS 1 AND 2, 2008, : 788 - 792
  • [24] Explicit Interaction Model towards Text Classification
    Du, Cunxiao
    Chen, Zhaozheng
    Feng, Fuli
    Zhu, Lei
    Gan, Tian
    Nie, Liqiang
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 6359 - 6366
  • [25] Integrating Text Classification into Topic Discovery Using Semantic Embedding Models
    Lezama-Sanchez, Ana Laura
    Vidal, Mireya Tovar
    Reyes-Ortiz, Jose A.
    APPLIED SCIENCES-BASEL, 2023, 13 (17):
  • [26] Using CoTraining and Semantic Feature Extraction for Positive and Unlabeled Text Classification
    Luo, Na
    Yuan, Fuyu
    Zuo, Wanli
    2008 INTERNATIONAL SEMINAR ON FUTURE INFORMATION TECHNOLOGY AND MANAGEMENT ENGINEERING, PROCEEDINGS, 2008, : 218 - +
  • [27] Text classification using genetic algorithm oriented latent semantic features
    Uysal, Alper Kursat
    Gunal, Serkan
    EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (13) : 5938 - 5947
  • [28] Using Graph-Kernels to Represent Semantic Information in Text Classification
    Goncalves, Teresa
    Quaresma, Paulo
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, 2009, 5632 : 632 - 646
  • [29] Semantic Based Text Classification Using WordNets: Indian Language Perspective
    Mohanty, S.
    Santi, P. K.
    Mishra, Ranjeeta
    Mohapatra, R. N.
    Swain, Sabyasachi
    GWC 2006: THIRD INTERNATIONAL WORDNET CONFERENCE, PROCEEDINGS, 2005, : 321 - 324
  • [30] WordNet-based lexical semantic classification for text corpus analysis
    Long Jun
    Wang Lu-da
    Li Zu-de
    Zhang Zu-ping
    Yang Liu
    JOURNAL OF CENTRAL SOUTH UNIVERSITY, 2015, 22 (05) : 1833 - 1840