Improving information retrieval through correspondence analysis instead of latent semantic analysis

被引:0
|
作者
Qi, Qianqian [1 ]
Hessen, David J. [1 ]
van der Heijden, Peter G. M. [1 ,2 ]
机构
[1] Univ Utrecht, Fac Social Sci, Dept Methodol & Stat, Utrecht, Netherlands
[2] Univ Southampton, Southampton Stat Sci Res Inst, Southampton, England
关键词
Singular value decomposition; Singular value weighting exponent; Initial dimensions; Information retrieval;
D O I
10.1007/s10844-023-00815-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The initial dimensions extracted by latent semantic analysis (LSA) of a document-term matrix have been shown to mainly display marginal effects, which are irrelevant for information retrieval. To improve the performance of LSA, usually the elements of the raw document-term matrix are weighted and the weighting exponent of singular values can be adjusted. An alternative information retrieval technique that ignores the marginal effects is correspondence analysis (CA). In this paper, the information retrieval performance of LSA and CA is empirically compared. Moreover, it is explored whether the two weightings also improve the performance of CA. The results for four empirical datasets show that CA always performs better than LSA. Weighting the elements of the raw data matrix can improve CA; however, it is data dependent and the improvement is small. Adjusting the singular value weighting exponent often improves the performance of CA; however, the extent of the improvement depends on the dataset and the number of dimensions.
引用
收藏
页码:209 / 230
页数:22
相关论文
共 50 条
  • [1] Improving information retrieval through correspondence analysis instead of latent semantic analysis
    Qianqian Qi
    David J. Hessen
    Peter G. M. van der Heijden
    Journal of Intelligent Information Systems, 2024, 62 : 209 - 230
  • [2] Boosting Novelty for Biomedical Information Retrieval through Probabilistic Latent Semantic Analysis
    An, Xiangdong
    Huang, Jimmy Xiangji
    SIGIR'13: THE PROCEEDINGS OF THE 36TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH & DEVELOPMENT IN INFORMATION RETRIEVAL, 2013, : 829 - 832
  • [3] Application Research on Latent Semantic Analysis for Information Retrieval
    Chen Wenli
    PROCEEDINGS 2016 EIGHTH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION ICMTMA 2016, 2016, : 118 - 121
  • [4] IMPROVING INFORMATION-RETRIEVAL WITH LATENT SEMANTIC INDEXING
    DEERWESTER, S
    DUMAIS, S
    LANDAUER, T
    FURNAS, G
    BECK, L
    PROCEEDINGS OF THE ASIS ANNUAL MEETING, 1988, 25 : 36 - 40
  • [5] Analysis and development of latent semantic indexing techniques for information retrieval
    Bottello, M.
    Data Mining VII: Data, Text and Web Mining and Their Business Applications, 2006, 37 : 193 - 202
  • [6] Latent semantic analysis and Fiedler retrieval
    Hendrickson, Bruce
    LINEAR ALGEBRA AND ITS APPLICATIONS, 2007, 421 (2-3) : 345 - 355
  • [7] Trajectory Retrieval with Latent Semantic Analysis
    Papadopoulos, Apostolos N.
    APPLIED COMPUTING 2008, VOLS 1-3, 2008, : 1089 - 1094
  • [8] Taking a new look at the Latent Semantic Analysis approach to information retrieval
    Jessup, ER
    Martin, JH
    COMPUTATIONAL INFORMATION RETRIEVAL, 2001, : 121 - 144
  • [9] Enhancing latent semantic analysis video object retrieval with structural information
    Hohl, L
    Souvannavong, F
    Merialdo, B
    Huet, B
    ICIP: 2004 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1- 5, 2004, : 1609 - 1612
  • [10] Latent Semantic Indexing using eigenvalue analysis for efficient information retrieval
    School of Computing Sciences, Vellore Institute of Technology, Deemed University, Vellore - 632014, India
    不详
    Int. J. Appl. Math. Comput. Sci., 2006, 4 (551-558):