Supervised latent semantic indexing for document categorization

被引:20
|
作者
Sun, JT [1 ]
Chen, Z [1 ]
Zeng, HJ [1 ]
Lu, YC [1 ]
Shi, CY [1 ]
Ma, WY [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci, Beijing 100084, Peoples R China
关键词
D O I
10.1109/ICDM.2004.10004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Latent Semantic Indexing (LSI) is a successful technology in information retrieval (IS) which attempts to explore the latent semantics implied by a query or a document through representing them in a dimension-reduced space. However LSI is not optimal for document categorization tasks because it aims to find the most representative features for document representation rather than the most discriminative ones. In this paper we propose Supervised LSI (SLSI) which selects the most discriminative basis vectors using the training data iteratively. The extracted vectors are then used to project the documents into a reduced dimensional space for better classification. Experimental evaluations show that the SLSI approach leads to dramatic dimension reduction while achieving good classification results.
引用
收藏
页码:535 / 538
页数:4
相关论文
共 50 条
  • [31] LSISOM - A latent semantic indexing approach to Self-Organizing Maps of document collections
    Ampazis, N
    Perantonis, SJ
    NEURAL PROCESSING LETTERS, 2004, 19 (02) : 157 - 173
  • [32] LSISOM — A Latent Semantic Indexing Approach to Self-Organizing Maps of Document Collections
    Nikolaos Ampazis
    Stavros J. Perantonis
    Neural Processing Letters, 2004, 19 : 157 - 173
  • [33] Information retrieval and text categorization with semantic indexing
    Rosso, P
    Molina, A
    Pla, F
    Jiménez, D
    Vidal, V
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2004, 2945 : 596 - 600
  • [34] A Boosted Supervised Semantic Indexing for Reranking
    Makino, Takuya
    Iwakura, Tomoya
    INFORMATION RETRIEVAL TECHNOLOGY, AIRS 2017, 2017, 10648 : 16 - 28
  • [35] Latent semantic indexing: A probabilistic analysis
    Papadimitriou, CH
    Raghavan, P
    Tamaki, H
    Vempala, S
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2000, 61 (02) : 217 - 235
  • [36] Matrix Factorization in Latent Semantic Indexing
    Ng, Wei Shean
    Tang, Wen Kai Adrian
    2ND SEA-STEM INTERNATIONAL CONFERENCE 2021, 2021, : 136 - 139
  • [37] Text segmentation by latent semantic indexing
    Ishioka, T
    NEW DEVELOPMENTS IN PSYCHOMETRICS, 2003, : 689 - 696
  • [38] On updating problems in latent semantic indexing
    Zha, HY
    Simon, HD
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1999, 21 (02): : 782 - 791
  • [39] A probabilistic model for Latent Semantic Indexing
    Ding, CHQ
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2005, 56 (06): : 597 - 608
  • [40] LATENT SEMANTIC INDEXING FOR PATENT DOCUMENTS
    Moldovan, Andreea
    Bot, Radu Ioan
    Wanka, Gert
    INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2005, 15 (04) : 551 - 560