Fast Text Classification Using Randomized Explicit Semantic Analysis

被引:9
|
作者
Musaev, Aibek [1 ]
Wang, De [1 ]
Shridhar, Saajan [1 ]
Pu, Calton [1 ]
机构
[1] Georgia Inst Technol, Atlanta, GA 30332 USA
来源
2015 IEEE 16TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION | 2015年
基金
美国国家科学基金会;
关键词
text classification; explicit semantic analysis; social media; event detection;
D O I
10.1109/IRI.2015.62
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document classification or document categorization is one of the most studied areas in computer science due to its importance. The problem is to assign a document using its text to one or more classes or categories from a predefined set. We propose a new approach for fast text classification using randomized explicit semantic analysis (RS-ESA). It is based on a state of the art approach for word sense disambiguation based on Wikipedia, the largest encyclopedia in existence. Our method reduces Wikipedia repository using a random sample approach resulting in a throughput, which is an order of magnitude faster than the original explicit semantic analysis. RS-ESA approach has been implemented as part of the LITMUS project due to a need in classifying data from Social Media into relevant and irrelevant items with respect to landslide as a natural disaster. We demonstrate that our approach achieves 96% precision when classifying Social Media landslide data collected in December 2014. We also demonstrate the genericity of the proposed approach by using it for separating factual texts from fictional based on Wikipedia articles and fan fiction stories, where we achieve 97% in precision.
引用
收藏
页码:364 / 371
页数:8
相关论文
共 50 条
  • [41] Properties and Structure of Fast Text Search Engine in Context of Semantic Image Analysis
    Rygal, Janusz
    Najgebauer, Patryk
    Nowak, Tomasz
    Romanowski, Jakub
    Gabryel, Marcin
    Scherer, Rafal
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT I, 2012, 7267 : 592 - 599
  • [42] Short Text Classification Using Contextual Analysis
    Sulaimani, Sami Al
    Starkey, Andrew
    IEEE ACCESS, 2021, 9 : 149619 - 149629
  • [43] Semantic Conceptual Primitives Computing in Text Classification
    Zhang, Quan
    Yuan, Yi
    Wei, Xiangfeng
    Chi, Zhejie
    Cong, Peimin
    Du, Yihua
    PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2014), 2014, : 215 - 218
  • [44] Text Classification Based on Title Semantic Information
    Liu, YunXiang
    Xu, Qi
    Wang, ChunYa
    2020 5TH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATICS AND BIOMEDICAL SCIENCES (ICIIBMS 2020), 2020, : 29 - 33
  • [45] Semantic text classification of emergent disease reports
    Zhang, Yi
    Liu, Bing
    KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2007, PROCEEDINGS, 2007, 4702 : 629 - +
  • [46] Combined syntactic and semantic kernels for text classification
    Bloehdorn, Stephan
    Moschitti, Alessandro
    ADVANCES IN INFORMATION RETRIEVAL, 2007, 4425 : 307 - +
  • [47] Semantic Clustering for a Functional Text Classification Task
    Lippincott, Thomas
    Passonneau, Rebecca
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2009, 5449 : 509 - +
  • [48] Arabic Text Classification based on Semantic Relations
    Hijazi, Musab
    Zeki, Akram
    Ismail, Amelia
    INTERNATIONAL JOURNAL OF MATHEMATICS AND COMPUTER SCIENCE, 2022, 17 (02): : 937 - 946
  • [49] Text classification for cognitive domains: A case using lexical, syntactic and semantic features
    Qiao, Chen
    Hu, Xiao
    JOURNAL OF INFORMATION SCIENCE, 2019, 45 (04) : 516 - 528
  • [50] A neuro-SVM model for text classification using latent semantic indexing
    Mitra, V
    Wang, CJ
    Banerjee, S
    PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), VOLS 1-5, 2005, : 564 - 569