Power Law for Text Categorization

被引:0
|
作者
Liu, Wuying [1 ]
Wang, Lin [2 ]
Yi, Mianzhu [1 ]
机构
[1] PLA Univ Foreign Languages, Luoyang 471003, Henan, Peoples R China
[2] Natl Univ Def Technol, Changsha 410073, Hunan, Peoples R China
关键词
Text Categorization; Power Law; Online Binary TC; Batch Multi-Category TC; TREC;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text categorization (TC) is a challenging issue, and the corresponding algorithms can be used in many applications. This paper addresses the online multi-category TC problem abstracted from the applications of online binary TC and batch multi-category TC. Most applications are concerned about the space-time performance of TC algorithms. Through the investigation of the token frequency distribution in an email collection and a Chinese web document collection, this paper re-examines the power law and proposes a random sampling ensemble Bayesian (RSEB) TC algorithm. Supported by a token level memory to store labeled documents, the RSEB algorithm uses a text retrieval approach to solve text categorization problems. The experimental results show that the RSEB algorithm can achieve the state-of-the-art performance at greatly reduced space-time requirements both in the TREC email spam filtering task and the Chinese web document classifying task.
引用
收藏
页码:131 / 143
页数:13
相关论文
共 50 条
  • [31] Using SVMs for text categorization
    Dumais, S
    IEEE INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 1998, 13 (04): : 21 - 23
  • [32] Sparse Representations for Text Categorization
    Sainath, Tara N.
    Maskey, Sameer
    Kanevsky, Dimitri
    Ramabhadran, Bhuvana
    Nahamoo, David
    Hirschberg, Julia
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2266 - +
  • [33] Automatic text categorization and its application to text retrieval
    Lam, W
    Ruiz, M
    Srinivasan, P
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1999, 11 (06) : 865 - 879
  • [34] A Comprehensive Tool for Text Categorization and Text Summarization in Bioinformatics
    Kamal, Md. Mustofa
    Sultana, Kazi Zakia
    2012 15TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2012, : 592 - 597
  • [35] A comparative study on text representation schemes in text categorization
    Song, FX
    Liu, SH
    Yang, JY
    PATTERN ANALYSIS AND APPLICATIONS, 2005, 8 (1-2) : 199 - 209
  • [36] A Survey on Different Text Categorization Techniques for Text Filtration
    Yadav, Shashank H.
    Pame, Balu L.
    PROCEEDINGS OF 2015 IEEE 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO), 2015,
  • [37] A comparative study on text representation schemes in text categorization
    Fengxi Song
    Shuhai Liu
    Jingyu Yang
    Pattern Analysis and Applications, 2005, 8 : 199 - 209
  • [38] Computing with words for text processing: An approach to the text categorization
    Zadrozny, S
    Kacprzyk, J
    INFORMATION SCIENCES, 2006, 176 (04) : 415 - 437
  • [39] Integration of manual and automatic text categorization. A categorization workbench for text-based email and spam
    Sun, Q
    Schommer, C
    Lang, A
    KI 2004: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 3238 : 156 - 167
  • [40] The method of text categorization on imbalanced datasets
    Li Xin-fu
    Yu Yan
    Yin Peng
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMMUNICATION SOFTWARE AND NETWORKS, 2009, : 650 - 653