Improving information retrieval through correspondence analysis instead of latent semantic analysis

被引:0
|
作者
Qi, Qianqian [1 ]
Hessen, David J. [1 ]
van der Heijden, Peter G. M. [1 ,2 ]
机构
[1] Univ Utrecht, Fac Social Sci, Dept Methodol & Stat, Utrecht, Netherlands
[2] Univ Southampton, Southampton Stat Sci Res Inst, Southampton, England
关键词
Singular value decomposition; Singular value weighting exponent; Initial dimensions; Information retrieval;
D O I
10.1007/s10844-023-00815-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The initial dimensions extracted by latent semantic analysis (LSA) of a document-term matrix have been shown to mainly display marginal effects, which are irrelevant for information retrieval. To improve the performance of LSA, usually the elements of the raw document-term matrix are weighted and the weighting exponent of singular values can be adjusted. An alternative information retrieval technique that ignores the marginal effects is correspondence analysis (CA). In this paper, the information retrieval performance of LSA and CA is empirically compared. Moreover, it is explored whether the two weightings also improve the performance of CA. The results for four empirical datasets show that CA always performs better than LSA. Weighting the elements of the raw data matrix can improve CA; however, it is data dependent and the improvement is small. Adjusting the singular value weighting exponent often improves the performance of CA; however, the extent of the improvement depends on the dataset and the number of dimensions.
引用
收藏
页码:209 / 230
页数:22
相关论文
共 50 条
  • [21] Improving information retrieval in functional analysis
    Rodriguez, Juan C.
    Gonzalez, German A.
    Fresno, Cristobal
    Llera, Andrea S.
    Fernandez, Elmer A.
    COMPUTERS IN BIOLOGY AND MEDICINE, 2016, 79 : 10 - 20
  • [22] Exploiting salient semantic analysis for information retrieval
    Luo, Jing
    Meng, Bo
    Quan, Changqin
    Tu, Xinhui
    ENTERPRISE INFORMATION SYSTEMS, 2016, 10 (09) : 959 - 969
  • [23] Structurally enhanced latent semantic analysis for video object retrieval
    Souvannavong, F
    Hohl, L
    Merialdo, B
    Huet, B
    IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 2005, 152 (06): : 859 - 867
  • [24] Learning Similarity with Probabilistic Latent Semantic Analysis for Image Retrieval
    Li, Xiong
    Lv, Qi
    Huang, Wenting
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2015, 9 (04): : 1424 - 1440
  • [25] Personal information retrieval based on latent semantic indexing
    Yang, Z
    Deng, GS
    PROCEEDINGS OF 2002 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE & ENGINEERING, VOLS I AND II, 2002, : 287 - 291
  • [26] Using latent semantic indexing for multilanguage information retrieval
    Berry, MW
    Young, PG
    COMPUTERS AND THE HUMANITIES, 1995, 29 (06): : 413 - 429
  • [27] INFORMATION RETRIEVAL BASED UPON LATENT CLASS ANALYSIS
    BAKER, FB
    JOURNAL OF THE ACM, 1962, 9 (04) : 512 - &
  • [28] LATENT CLASS ANALYSIS AS AN ASSOCIATION MODEL FOR INFORMATION RETRIEVAL
    BAKER, FB
    STATISTICAL ASSOCIATION METHODS FOR MECHANIZED DOCUMENTATION SYMPOSIUM PROCEEDINGS, 1965, 1964 (NBS26): : 149 - &
  • [29] Extracting marketing information from product reviews: a comparative study of latent semantic analysis and probabilistic latent semantic analysis
    Shimi Naurin Ahmad
    Michel Laroche
    Journal of Marketing Analytics, 2023, 11 : 662 - 676
  • [30] Extracting marketing information from product reviews: a comparative study of latent semantic analysis and probabilistic latent semantic analysis
    Ahmad, Shimi Naurin
    Laroche, Michel
    JOURNAL OF MARKETING ANALYTICS, 2023, 11 (04) : 662 - 676