Effectiveness of document representation for classification

被引:0
|
作者
Chen, DY [1 ]
Li, X [1 ]
Dong, ZY [1 ]
Chen, X [1 ]
机构
[1] Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane, Qld 4072, Australia
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Conventionally, document classification researches focus on improving the learning capabilities of classifiers. Nevertheless, according to our observation, the effectiveness of classification is limited by the suitability of document representation. Intuitively, the more features that are used in representation, the more comprehensive that documents are represented. However, if a representation contains too many irrelevant features, the classifier would suffer from not only the curse of high dimensionality, but also overfitting. To address this problem of suitableness of document representations, we present a classifier-independent approach to measure the effectiveness of document representations. Our approach utilises a labelled document corpus to estimate the distribution of documents in the feature space. By looking through documents in this way, we can clearly identify the contributions made by different features toward the document classification. Some experiments have been performed to show how the effectiveness is evaluated. Our approach can be used as a tool to assist feature selection, dimensionality reduction and document classification.
引用
收藏
页码:368 / 377
页数:10
相关论文
共 50 条
  • [1] Distributed Document Representation for Document Classification
    Li, Rumeng
    Shindo, Hiroyuki
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PART I, 2015, 9077 : 212 - 225
  • [2] Vietnamese Document Representation and Classification
    Nguyen, Giang-Son
    Gao, Xiaoying
    Andreae, Peter
    AI 2009: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2009, 5866 : 577 - 586
  • [3] Rich document representation and classification: An analysis
    Keikha, Mostafa
    Khonsari, Ahmad
    Oroumchian, Farhad
    KNOWLEDGE-BASED SYSTEMS, 2009, 22 (01) : 67 - 71
  • [4] Hierarchical Neural Representation for Document Classification
    Jianming Zheng
    Fei Cai
    Wanyu Chen
    Chong Feng
    Honghui Chen
    Cognitive Computation, 2019, 11 : 317 - 327
  • [5] Hierarchical Neural Representation for Document Classification
    Zheng, Jianming
    Cai, Fei
    Chen, Wanyu
    Feng, Chong
    Chen, Honghui
    COGNITIVE COMPUTATION, 2019, 11 (02) : 317 - 327
  • [6] Effectiveness of syntactic information for document classification
    Min, K
    Wilson, WH
    AI 2003: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2003, 2903 : 992 - 1002
  • [7] Hybred: An OCR document representation for classification tasks
    Laroum, Sami
    Béchet, Nicolas
    Hamza, Hatem
    Roche, Mathieu
    International Journal of Computer Science Issues, 2011, 8 (3 3-2): : 1 - 8
  • [8] The hybrid representation model for web document classification
    Markov, A.
    Last, M.
    Kandel, A.
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2008, 23 (06) : 654 - 679
  • [9] The Influence of Feature Representation of Text on the Performance of Document Classification
    Martincic-Ipsic, Sanda
    Milicic, Tanja
    Todorovski, Ljupco
    APPLIED SCIENCES-BASEL, 2019, 9 (04):
  • [10] USING CONCEPTUAL DOCUMENT REPRESENTATION FOR MULTILINGUAL TEXT CLASSIFICATION
    Borges Garcia, A.
    Castro Castro, D.
    Ortega-Bueno, R.
    HOLOS, 2018, 34 (02) : 386 - 396