Exploiting category information and document information to improve term weighting for text categorization

被引:0
|
作者
Li, Jingyang [1 ]
Sun, Maosong [1 ]
机构
[1] Tsinghua Univ, Natl Lab Intelligent Technol & Syst, Dept Comp Sci & Tech, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional tfidf-like term weighting schemes have a rough statistic - idf as the term weighting factor, which does not exploit the category information (category labels on documents) and intra-document information (the relative importance of a given term to a given document that contains it) from the training data for a text categorization task. We present here a more elaborate nonparametric probabilistic model to make use of this sort of information in the term weighting phase. idf is theoretically proved to be a rough approximation of this new term weighting factor. This work is preliminary and mainly aiming at providing inspiration for further study on exploitation of this information, but it already provides a moderate performance boost on three popular document collections.
引用
收藏
页码:587 / +
页数:2
相关论文
共 50 条
  • [1] Term Weighting using Contextual Information for Categorization of Unstructured Text Documents
    Kulkarni, Anagha
    Tokekar, Vrinda
    Kulkarni, Parag
    2015 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2015,
  • [2] Exploiting structural information for semi-structured document categorization
    Bratko, A
    Filipic, B
    INFORMATION PROCESSING & MANAGEMENT, 2006, 42 (03) : 679 - 694
  • [3] A term weighting approach for text categorization
    Lee, KC
    Kang, SS
    Hahn, KS
    INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2005, 3689 : 673 - 678
  • [4] Inverse-Category-Frequency Based Supervised Term Weighting Schemes for Text Categorization
    Wang, Deqing
    Zhang, Hui
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2013, 29 (02) : 209 - 225
  • [5] Two novel term weighting for text categorization
    Matsunaga, L. A.
    Ebecken, N. F. F.
    DATA MINING IX: DATA MINING, PROTECTION, DETECTION AND OTHER SECURITY TECHNOLOGIES, 2008, 40 : 105 - 114
  • [6] Supervised term weighting for automated text categorization
    Debole, F
    Sebastiani, F
    TEXT MINING AND ITS APPLICATIONS, 2004, 138 : 81 - 97
  • [7] A semantic term weighting scheme for text categorization
    Luo, Qiming
    Chen, Enhong
    Xiong, Hui
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (10) : 12708 - 12716
  • [8] Text document categorization by term association
    Antonie, ML
    Zaïane, OR
    2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2002, : 19 - 26
  • [9] Information-theoretic Term Weighting Schemes for Document Clustering
    Ke, Weimao
    JCDL'13: PROCEEDINGS OF THE 13TH ACM/IEEE-CS JOINT CONFERENCE ON DIGITAL LIBRARIES, 2013, : 143 - 152
  • [10] Identifying Contextual Information in Document Classification using Term Weighting
    Deshmukh, Pratiksha R.
    Phalnikar, Rashmi
    PROCEEDINGS OF THE 2018 IEEE 8TH INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC 2018), 2018, : 72 - 78