An index-based algorithm for fast on-line query processing of latent semantic analysis

被引:2
|
作者
Zhang, Mingxi [1 ,2 ]
Li, Pohan [2 ]
Wang, Wei [2 ]
机构
[1] Univ Shanghai Sci & Technol, Coll Commun & Art Design, Shanghai, Peoples R China
[2] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
来源
PLOS ONE | 2017年 / 12卷 / 05期
基金
上海市自然科学基金;
关键词
SINGULAR-VALUE DECOMPOSITION; JACOBI SVD ALGORITHM; SIMILARITY SEARCH; RECOMMENDER; NETWORK;
D O I
10.1371/journal.pone.0177523
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Latent Semantic Analysis (LSA) is widely used for finding the documents whose semantic is similar to the query of keywords. Although LSA yield promising similar results, the existing LSA algorithms involve lots of unnecessary operations in similarity computation and candidate check during on-line query processing, which is expensive in terms of time cost and cannot efficiently response the query request especially when the dataset becomes large. In this paper, we study the efficiency problem of on-line query processing for LSA towards efficiently searching the similar documents to a given query. We rewrite the similarity equation of LSA combined with an intermediate value called partial similarity that is stored in a designed index called partial index. For reducing the searching space, we give an approximate form of similarity equation, and then develop an efficient algorithm for building partial index, which skips the partial similarities lower than a given threshold.. Based on partial index, we develop an efficient algorithm called ILSA for supporting fast on-line query processing. The given query is transformed into a pseudo document vector, and the similarities between query and candidate documents are computed by accumulating the partial similarities obtained from the index nodes corresponds to non-zero entries in the pseudo document vector. Compared to the LSA algorithm, ILSA reduces the time cost of on-line query processing by pruning the candidate documents that are not promising and skipping the operations that make little contribution to similarity scores. Extensive experiments through comparison with LSA have been done, which demonstrate the efficiency and effectiveness of our proposed algorithm.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Index-Based Batch Query Processing Revisited
    Mackenzie, Joel
    Moffat, Alistair
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT III, 2023, 13982 : 86 - 100
  • [2] Index-Based Semantic Tagging for Efficient Query Interpretation
    Devezas, Jose
    Nunes, Sergio
    EXPERIMENTAL IR MEETS MULTILINGUALITY, MULTIMODALITY, AND INTERACTION, CLEF 2016, 2016, 9822 : 208 - 213
  • [3] Index-based query processing on distributed multidimensional data
    Tsatsanifos, George
    Sacharidis, Dimitris
    Sellis, Timos
    GEOINFORMATICA, 2013, 17 (03) : 489 - 519
  • [4] Index-based query processing on distributed multidimensional data
    George Tsatsanifos
    Dimitris Sacharidis
    Timos Sellis
    GeoInformatica, 2013, 17 : 489 - 519
  • [5] An Index-based Secure Query Processing Scheme for Outsourced Databases
    Akiyama, Kento
    Shinozuka, Chisato
    Watanabe, Chiemi
    Amagasa, Toshiyuki
    Kitagawa, Hiroyuki
    19TH INTERNATIONAL CONFERENCE ON INFORMATION INTEGRATION AND WEB-BASED APPLICATIONS & SERVICES (IIWAS2017), 2017, : 215 - 223
  • [6] HI-Sky: Hash Index-Based Skyline Query Processing
    Choi, Jong-Hyeok
    Hao, Fei
    Nasridinov, Aziz
    APPLIED SCIENCES-BASEL, 2020, 10 (05):
  • [7] On-Line Grid Monitoring Based on Distributed Query Processing
    Balis, Bartosz
    Dyk, Grzegorz
    Bubak, Marian
    PARALLEL PROCESSING AND APPLIED MATHEMATICS, PT II, 2012, 7204 : 131 - 140
  • [8] FAST LATENT SEMANTIC INDEX USING RANDOM MAPPING IN TEXT PROCESSING
    Qian, Xiao-Dong
    PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON WAVELET ANALYSIS AND PATTERN RECOGNITION, VOLS 1 AND 2, 2008, : 788 - 792
  • [9] Index-based fast search algorithm of image database on internet
    Yeh, CH
    Kuo, CJ
    2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 1195 - 1198
  • [10] AN INDEX-BASED APPROACH TO QUERY MAMMOGRAPHIC DATABASES
    Valente, Frederico
    Bastiao, Luis
    Silva, Augusto
    ICEM15: 15TH INTERNATIONAL CONFERENCE ON EXPERIMENTAL MECHANICS, 2012,