Exploiting category information and document information to improve term weighting for text categorization

被引:0
|
作者
Li, Jingyang [1 ]
Sun, Maosong [1 ]
机构
[1] Tsinghua Univ, Natl Lab Intelligent Technol & Syst, Dept Comp Sci & Tech, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional tfidf-like term weighting schemes have a rough statistic - idf as the term weighting factor, which does not exploit the category information (category labels on documents) and intra-document information (the relative importance of a given term to a given document that contains it) from the training data for a text categorization task. We present here a more elaborate nonparametric probabilistic model to make use of this sort of information in the term weighting phase. idf is theoretically proved to be a rough approximation of this new term weighting factor. This work is preliminary and mainly aiming at providing inspiration for further study on exploitation of this information, but it already provides a moderate performance boost on three popular document collections.
引用
收藏
页码:587 / +
页数:2
相关论文
共 50 条
  • [21] A New Improved Term Weighting Scheme for Text Categorization
    Nguyen Pham Xuan
    Hieu Le Quang
    KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2013), VOL 1, 2014, 244 : 261 - 270
  • [22] A novel term weighting scheme for automated text categorization
    Xu, Hongzhi
    Li, Chunping
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, 2007, : 759 - 764
  • [23] Exploiting Category Information in Sequential Recommendation
    Xu, Shuxiang
    Xiang, Qibu
    Fan, Yushun
    Yan, Ruyu
    Zhang, Jia
    SERVICE-ORIENTED COMPUTING, ICSOC 2023, PT I, 2023, 14419 : 51 - 66
  • [24] Information-theoretic term weighting schemes for document clustering and classification
    Ke, Weimao
    INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, 2015, 16 (02) : 145 - 159
  • [25] A Study of Term Weighting Schemes Using Class Information for Text Classification
    Ko, Youngjoong
    SIGIR 2012: PROCEEDINGS OF THE 35TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2012, : 1029 - 1030
  • [26] Structural Information Based Term Weighting in Text Retrieval for Feature Location
    Bassett, Blake
    Kraft, Nicholas A.
    2013 IEEE 21ST INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC), 2013, : 133 - 141
  • [27] Explicit Use of Term Occurrence Probabilities for Term Weighting in Text Categorization
    Erenel, Zafer
    Altincay, Hakan
    Varoglu, Ekrem
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2011, 27 (03) : 819 - 834
  • [28] Supervised and Traditional Term Weighting Methods for Automatic Text Categorization
    Lan, Man
    Tan, Chew Lim
    Su, Jian
    Lu, Yue
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2009, 31 (04) : 721 - 735
  • [29] Dimensionality Reduction with Category Information Fusion and Non-negative Matrix Factorization for Text Categorization
    Zheng, Wenbin
    Qian, Yuntao
    Tang, Hong
    ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, PT III, 2011, 7004 : 505 - +
  • [30] On entropy-based term weighting schemes for text categorization
    Wang, Tao
    Cai, Yi
    Leung, Ho-fung
    Lau, Raymond Y. K.
    Xie, Haoran
    Li, Qing
    KNOWLEDGE AND INFORMATION SYSTEMS, 2021, 63 (09) : 2313 - 2346