Fast Text Classification Using Randomized Explicit Semantic Analysis

被引:9
|
作者
Musaev, Aibek [1 ]
Wang, De [1 ]
Shridhar, Saajan [1 ]
Pu, Calton [1 ]
机构
[1] Georgia Inst Technol, Atlanta, GA 30332 USA
来源
2015 IEEE 16TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION | 2015年
基金
美国国家科学基金会;
关键词
text classification; explicit semantic analysis; social media; event detection;
D O I
10.1109/IRI.2015.62
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document classification or document categorization is one of the most studied areas in computer science due to its importance. The problem is to assign a document using its text to one or more classes or categories from a predefined set. We propose a new approach for fast text classification using randomized explicit semantic analysis (RS-ESA). It is based on a state of the art approach for word sense disambiguation based on Wikipedia, the largest encyclopedia in existence. Our method reduces Wikipedia repository using a random sample approach resulting in a throughput, which is an order of magnitude faster than the original explicit semantic analysis. RS-ESA approach has been implemented as part of the LITMUS project due to a need in classifying data from Social Media into relevant and irrelevant items with respect to landslide as a natural disaster. We demonstrate that our approach achieves 96% precision when classifying Social Media landslide data collected in December 2014. We also demonstrate the genericity of the proposed approach by using it for separating factual texts from fictional based on Wikipedia articles and fan fiction stories, where we achieve 97% in precision.
引用
收藏
页码:364 / 371
页数:8
相关论文
共 50 条
  • [31] WordNet-based lexical semantic classification for text corpus analysis
    Jun Long
    Lu-da Wang
    Zu-de Li
    Zu-ping Zhang
    Liu Yang
    Journal of Central South University, 2015, 22 : 1833 - 1840
  • [32] Allerdictor: fast allergen prediction using text classification techniques
    Dang, Ha X.
    Lawrence, Christopher B.
    BIOINFORMATICS, 2014, 30 (08) : 1120 - 1128
  • [33] Automatic Text Summarization Using Latent Semantic Analysis
    Mashechkin, I. V.
    Petrovskiy, M. I.
    Popov, D. S.
    Tsarev, D. V.
    PROGRAMMING AND COMPUTER SOFTWARE, 2011, 37 (06) : 299 - 305
  • [34] Affect analysis of text using fuzzy semantic typing
    Subasic, P
    Huettner, A
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2001, 9 (04) : 483 - 496
  • [35] Affect analysis of text using fuzzy semantic typing
    Subasic, P
    Huettner, A
    NINTH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2000), VOLS 1 AND 2, 2000, : 647 - 652
  • [36] TEXT CONTENT ANALYSIS USING ONTOLOGY AND SEMANTIC SIMILARITY
    Prodanovic, Dejan
    Furlan, Bojan
    Nikolic, Bosko
    2014 22ND TELECOMMUNICATIONS FORUM TELFOR (TELFOR), 2014, : 1126 - 1129
  • [37] A Comprehensive Analysis of using Semantic Information in Text Categorization
    Celik, Kerem
    Gungor, Tunga
    2013 IEEE INTERNATIONAL SYMPOSIUM ON INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS (IEEE INISTA), 2013,
  • [38] Automatic text summarization using latent semantic analysis
    I. V. Mashechkin
    M. I. Petrovskiy
    D. S. Popov
    D. V. Tsarev
    Programming and Computer Software, 2011, 37 : 299 - 305
  • [39] KANNADA TEXT SUMMARIZATION USING LATENT SEMANTIC ANALYSIS
    Geetha, J. K.
    Deepamala, N.
    2015 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2015, : 1508 - 1512
  • [40] IMAGE SEGMENTATION AND CLASSIFICATION USING SEMANTIC ANALYSIS
    Kollias, Stefanos
    ECS10: THE10TH EUROPEAN CONGRESS OF STEREOLOGY AND IMAGE ANALYSIS, 2009, : 285 - 290