Exploiting semantic resources for large scale text categorization

被引:0
|
作者
Jian Qiang Li
Yu Zhao
Bo Liu
机构
[1] NEC Laboratories China,
关键词
Web-scale text categorization; Semantic analysis; Semantic information processing;
D O I
暂无
中图分类号
学科分类号
摘要
The traditional supervised classifier for Text Categorization (TC) is learned from a set of hand-labeled documents. However, the task of manual data labeling is labor intensive and time consuming, especially for a complex TC task with hundreds or thousands of categories. To address this issue, many semi-supervised methods have been reported to use both labeled and unlabeled documents for TC. But they still need a small set of labeled data for each category. In this paper, we propose a Fully Automatic Categorization approach for Text (FACT), where no manual labeling efforts are required. In FACT, the lexical databases serve as semantic resources for category name understanding. It combines the semantic analysis of category names and statistic analysis of the unlabeled document set for fully automatic training data construction. With the support of lexical databases, we first use the category name to generate a set of features as a representative profile for the corresponding category. Then, a set of documents is labeled according to the representative profile. To reduce the possible bias originating from the category name and the representative profile, document clustering is used to refine the quality of initial labeling. The training data are subsequently constructed to train the discriminative classifier. The empirical experiments show that one variant of our FACT approach outperforms the state-of-the-art unsupervised TC approach significantly. It can achieve more than 90% of F1 performance of the baseline SVM methods, which demonstrates the effectiveness of the proposed approaches.
引用
收藏
页码:763 / 788
页数:25
相关论文
共 50 条
  • [21] A sparse version of the ridge logistic regression for large-scale text categorization
    Aseervatham, Sujeevan
    Antoniadis, Anestis
    Gaussier, Eric
    Burlet, Michel
    Denneulin, Yves
    PATTERN RECOGNITION LETTERS, 2011, 32 (02) : 101 - 106
  • [22] Exploiting Semantic Term Relations in Text Summarization
    Sarkar, Kamal
    Dam, Santanu
    INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2022, 12 (01)
  • [23] Local and Global Latent Semantic Analysis for Text Categorization
    Ghanem, Khadoudja
    INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2014, 4 (03) : 1 - 13
  • [24] A Comprehensive Analysis of using Semantic Information in Text Categorization
    Celik, Kerem
    Gungor, Tunga
    2013 IEEE INTERNATIONAL SYMPOSIUM ON INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS (IEEE INISTA), 2013,
  • [25] Text categorization with semantic commonsense knowledge: First results
    Majewski, Pawel
    Szymanski, Julian
    NEURAL INFORMATION PROCESSING, PART II, 2008, 4985 : 769 - 778
  • [26] KNN Text Categorization Algorithm Based on Semantic Centre
    Zhang Xiao-fei
    Huang He-yan
    Zhang Ke-liang
    2009 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND COMPUTER SCIENCE, VOL 1, PROCEEDINGS, 2009, : 249 - +
  • [27] Does Semantic Information Help in the Text Categorization Task?
    Ferretti, Edgardo
    Errecalde, Marcelo
    Rosso, Paolo
    JOURNAL OF INTELLIGENT SYSTEMS, 2008, 17 (1-3) : 91 - 106
  • [28] Fast text categorization using concise semantic analysis
    Li Zhixing
    Xiong Zhongyang
    Zhang Yufang
    Liu Chunyong
    Li Kuan
    PATTERN RECOGNITION LETTERS, 2011, 32 (03) : 441 - 448
  • [29] Text categorization in non-linear semantic space
    Biancalana, Claudio
    Micarelli, Alessandro
    AI(ASTERISK)IA 2007: ARTIFICIAL INTELLIGENCE AND HUMAN-ORIENTED COMPUTING, 2007, 4733 : 749 - 756
  • [30] Applying title category semantic recognition for text categorization
    School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
    Dianzi Yu Xinxi Xuebao, 2007, 12 (2885-2890):