Fast Text Classification Using Randomized Explicit Semantic Analysis

被引:9
|
作者
Musaev, Aibek [1 ]
Wang, De [1 ]
Shridhar, Saajan [1 ]
Pu, Calton [1 ]
机构
[1] Georgia Inst Technol, Atlanta, GA 30332 USA
基金
美国国家科学基金会;
关键词
text classification; explicit semantic analysis; social media; event detection;
D O I
10.1109/IRI.2015.62
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document classification or document categorization is one of the most studied areas in computer science due to its importance. The problem is to assign a document using its text to one or more classes or categories from a predefined set. We propose a new approach for fast text classification using randomized explicit semantic analysis (RS-ESA). It is based on a state of the art approach for word sense disambiguation based on Wikipedia, the largest encyclopedia in existence. Our method reduces Wikipedia repository using a random sample approach resulting in a throughput, which is an order of magnitude faster than the original explicit semantic analysis. RS-ESA approach has been implemented as part of the LITMUS project due to a need in classifying data from Social Media into relevant and irrelevant items with respect to landslide as a natural disaster. We demonstrate that our approach achieves 96% precision when classifying Social Media landslide data collected in December 2014. We also demonstrate the genericity of the proposed approach by using it for separating factual texts from fictional based on Wikipedia articles and fan fiction stories, where we achieve 97% in precision.
引用
收藏
页码:364 / 371
页数:8
相关论文
共 50 条
  • [1] Fast text categorization using concise semantic analysis
    Li Zhixing
    Xiong Zhongyang
    Zhang Yufang
    Liu Chunyong
    Li Kuan
    PATTERN RECOGNITION LETTERS, 2011, 32 (03) : 441 - 448
  • [2] Explicit Semantic Analysis for Computing Semantic Relatedness of Biomedical Text
    Jaiswal, Ayush
    Bhargava, Anunay
    2014 5TH INTERNATIONAL CONFERENCE CONFLUENCE THE NEXT GENERATION INFORMATION TECHNOLOGY SUMMIT (CONFLUENCE), 2014, : 929 - 934
  • [3] Automatic text classification using neuronets algorithms and semantic analysis
    Andreev, A
    Berezkin, D
    Morozov, V
    Simakov, K
    DIGITAL LIBRARIES: ADVANCED METHODS AND TECHNOLOGIES, DIGITAL COLLECTIONS, 2003, : 140 - 149
  • [4] Refined semantic kernel matching pursuit for fast text classification
    Zhang, Qing
    Zhang, Zhiping
    Wang, Li
    Journal of Computational Information Systems, 2012, 8 (20): : 8569 - 8579
  • [5] Short Text Classification Based on Explicit and Implicit Multiscale Weighted Semantic Information
    Gong, Jun
    Zhang, Juling
    Guo, Wenqiang
    Ma, Zhilong
    Lv, Xiaoyi
    SYMMETRY-BASEL, 2023, 15 (11):
  • [6] Improving Short Text Classification Using Fast Semantic Expansion on Multichannel Convolutional Neural Network
    Sotthisopha, Natthapat
    Vateekul, Peerapon
    2018 19TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2018, : 182 - 187
  • [7] Semantic Text Encoding for Text Classification using Convolutional Neural Networks
    Gallo, Ignazio
    Nawaz, Shah
    Calefati, Alessandro
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2017), VOL 5, 2017, : 16 - 21
  • [8] Using Semantic Correlation of HowNet for Short Text Classification
    Ning, Yahui
    Zhang, Li
    Ju, Yarong
    Wang, Weijia
    Li, Shunqin
    APPLIED SCIENCE, MATERIALS SCIENCE AND INFORMATION TECHNOLOGIES IN INDUSTRY, 2014, 513-517 : 1931 - 1934
  • [9] Transductive learning for text classification using explicit knowledge models
    Ifrim, Georgiana
    Weikum, Gerhard
    KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2006, PROCEEDINGS, 2006, 4213 : 223 - 234
  • [10] A Distributed Arabic Text Classification Approach Using Latent Semantic Analysis for Big data
    Alazzam, Hadeel
    Alsmady, Abdulsalam
    PROCEEDINGS OF THE 2017 12TH INTERNATIONAL SCIENTIFIC AND TECHNICAL CONFERENCE ON COMPUTER SCIENCES AND INFORMATION TECHNOLOGIES (CSIT 2017), VOL. 1, 2017, : 58 - 61