Improving information retrieval through correspondence analysis instead of latent semantic analysis

被引：0

作者：

Qi, Qianqian ^{[1
]}

Hessen, David J. ^{[1
]}

van der Heijden, Peter G. M. ^{[1
,2
]}

机构：

[1] Univ Utrecht, Fac Social Sci, Dept Methodol & Stat, Utrecht, Netherlands

[2] Univ Southampton, Southampton Stat Sci Res Inst, Southampton, England

来源：

JOURNAL OF INTELLIGENT INFORMATION SYSTEMS | 2024年 / 62卷 / 01期

关键词：

Singular value decomposition; Singular value weighting exponent; Initial dimensions; Information retrieval;

D O I：

10.1007/s10844-023-00815-y

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The initial dimensions extracted by latent semantic analysis (LSA) of a document-term matrix have been shown to mainly display marginal effects, which are irrelevant for information retrieval. To improve the performance of LSA, usually the elements of the raw document-term matrix are weighted and the weighting exponent of singular values can be adjusted. An alternative information retrieval technique that ignores the marginal effects is correspondence analysis (CA). In this paper, the information retrieval performance of LSA and CA is empirically compared. Moreover, it is explored whether the two weightings also improve the performance of CA. The results for four empirical datasets show that CA always performs better than LSA. Weighting the elements of the raw data matrix can improve CA; however, it is data dependent and the improvement is small. Adjusting the singular value weighting exponent often improves the performance of CA; however, the extent of the improvement depends on the dataset and the number of dimensions.

引用

页码：209 / 230

页数：22

共 50 条

[41] Latent semantic analysis
Dumais, ST
ANNUAL REVIEW OF INFORMATION SCIENCE AND TECHNOLOGY, 2004, 38 : 189 - 230
[42] COMPARISON OF LATENT SEMANTIC ANALYSIS AND PROBABILISTIC LATENT SEMANTIC ANALYSIS FOR DOCUMENTS CLUSTERING
Kuta, Marcin
Kitowski, Jacek
COMPUTING AND INFORMATICS, 2014, 33 (03) : 652 - 666
[43] Optimising the Heuristics in Latent Semantic Indexing for Effective Information Retrieval
Srinivas, S.
AswaniKumar, Ch
JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2006, 5 (02) : 97 - 105
[44] Large-scale information retrieval with latent semantic indexing
Letsche, TA
Berry, MW
INFORMATION SCIENCES, 1997, 100 (1-4) : 105 - 137
[45] Downdating the latent semantic indexing model for conceptual information retrieval
Witter, Dian I.
Berry, Michael W.
Computer Journal, 41 (08): : 589 - 601
[46] Downdating the latent semantic indexing model for conceptual information retrieval
Witter, DI
Berry, MW
COMPUTER JOURNAL, 1998, 41 (08): : 589 - 601
[47] A semidiscrete matrix decomposition for latent semantic indexing in information retrieval
Kolda, TG
O'Leary, DP
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1998, 16 (04) : 322 - 346
[48] A probabilistic model for latent semantic indexing in information retrieval and filtering
Ding, CHQ
COMPUTATIONAL INFORMATION RETRIEVAL, 2001, : 65 - 73
[49] Multidimensional Latent Semantic Analysis Using Term Spatial Information
Zhang, Haijun
Ho, John K. L.
Wu, Q. M. Jonathan
Ye, Yunming
IEEE TRANSACTIONS ON CYBERNETICS, 2013, 43 (06) : 1625 - 1640
[50] Ontology Development using Hozo and Semantic Analysis for Information Retrieval in Semantic Web
Singh, Gagandeep
Jain, Vishal
Singh, Mayank
2013 IEEE SECOND INTERNATIONAL CONFERENCE ON IMAGE INFORMATION PROCESSING (ICIIP), 2013, : 113 - 118

← 1 2 3 4 5 →