Improving information retrieval through correspondence analysis instead of latent semantic analysis

被引:0
|
作者
Qi, Qianqian [1 ]
Hessen, David J. [1 ]
van der Heijden, Peter G. M. [1 ,2 ]
机构
[1] Univ Utrecht, Fac Social Sci, Dept Methodol & Stat, Utrecht, Netherlands
[2] Univ Southampton, Southampton Stat Sci Res Inst, Southampton, England
关键词
Singular value decomposition; Singular value weighting exponent; Initial dimensions; Information retrieval;
D O I
10.1007/s10844-023-00815-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The initial dimensions extracted by latent semantic analysis (LSA) of a document-term matrix have been shown to mainly display marginal effects, which are irrelevant for information retrieval. To improve the performance of LSA, usually the elements of the raw document-term matrix are weighted and the weighting exponent of singular values can be adjusted. An alternative information retrieval technique that ignores the marginal effects is correspondence analysis (CA). In this paper, the information retrieval performance of LSA and CA is empirically compared. Moreover, it is explored whether the two weightings also improve the performance of CA. The results for four empirical datasets show that CA always performs better than LSA. Weighting the elements of the raw data matrix can improve CA; however, it is data dependent and the improvement is small. Adjusting the singular value weighting exponent often improves the performance of CA; however, the extent of the improvement depends on the dataset and the number of dimensions.
引用
收藏
页码:209 / 230
页数:22
相关论文
共 50 条
  • [31] Semantic Analysis Based Forms Information Retrieval and Classification
    Saba, Tanzila
    Alqahtani, Fatimah Ayidh
    3D RESEARCH, 2013, 4 (03): : 1 - 6
  • [32] Arabic Information Retrieval Using Semantic Analysis of Documents
    Al-Maghasbeh, Mohammad Khaled A.
    Bin Hamzah, Mohd Pouzi
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2018, 18 (05): : 53 - 58
  • [33] Concept Based Information Retrieval Using Semantic Analysis
    Sherimon, P. C.
    Saad, Youssef
    Krishnan, Reshmy
    Sherimon, Vinu
    13TH MIDDLE EASTERN SIMULATION & MODELLING MULTICONFERENCE (MESM 2012) 3RD GAMEON-ARABIA CONFERENCE, 2012, : 74 - 78
  • [34] Efficient Probabilistic Latent Semantic Analysis through Parallelization
    Wan, Raymond
    Anh, Vo Ngoc
    Mamitsuka, Hiroshi
    INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2009, 5839 : 432 - +
  • [35] Prior-based probabilistic latent semantic analysis for multimedia retrieval
    Ruben Fernandez-Beltran
    Filiberto Pla
    Multimedia Tools and Applications, 2018, 77 : 16771 - 16793
  • [36] Question retrieval based on probabilistic latent semantic analysis in Q & A community
    Tan, Chengfang
    Li, Hong
    Liu, Yundong
    Pan, Zhenggao
    Tan, Chengfang, 1600, Bentham Science Publishers B.V., P.O. Box 294, Bussum, 1400 AG, Netherlands (08): : 1033 - 1037
  • [37] Prior-based probabilistic latent semantic analysis for multimedia retrieval
    Fernandez-Beltran, Ruben
    Pla, Filiberto
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (13) : 16771 - 16793
  • [38] IMPROVING RETRIEVAL RESULTS THROUGH CITATION ANALYSIS
    VENKATARAMAN, SR
    CANADIAN JOURNAL OF INFORMATION SCIENCE-REVUE CANADIENNE DES SCIENCES DE L INFORMATION, 1988, 13 (3-4): : 40 - 46
  • [39] Application Researches on the Information Semantic Analysis and Weighted Technique in Information Retrieval
    Zhao, Xiaoli
    2017 4TH INTERNATIONAL CONFERENCE ON SOCIAL SCIENCE AND HUMANITY (ICSSH 2017), 2017, 100 : 113 - 118
  • [40] Latent semantic analysis
    Evangelopoulos, Nicholas E.
    WILEY INTERDISCIPLINARY REVIEWS-COGNITIVE SCIENCE, 2013, 4 (06) : 683 - 692