A method to improve full-text search performance of MongoDB

被引:1
|
作者
Mesut, Altan [1 ]
Ozturk, Emir [1 ]
机构
[1] Trakya Univ, Engn Fac, Dept Comp Engn, Edirne, Turkey
关键词
NoSQL; MongoDB; Text index; Full-Text search; MWCA;
D O I
10.5505/pajes.2021.89590
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
B-Tree based text indexes used in MongoDB are slow compared to different structures such as inverted indexes. In this study, it has been shown that the full-text search speed can be increased significantly by indexing a structure in which each different word in the text is included only once. The Multi-Stream Word-Based Compression Algorithm (MWCA), developed in our previous work, stores word dictionaries and data in different streams. While adding the documents to a MongoDB collection, they were encoded with MWCA and separated into six different streams. Each stream was stored in a different field, and three of them containing unique words were used when creating a text index. In this way, the index could be created in a shorter time and took up less space. It was also seen that Snappy and Zlib block compression methods used by MongoDB reached higher compression ratios on data encoded with MWCA. Search tests on text indexes created on collections using different compression options shows that our method provides 19 to 146 times speed increase and 34% to 40% less memory usage. Tests on regex searches that do not use the text index also shows that the MWCA model provides 7 to 13 times speed increase and 29% to 34% less memory usage.
引用
收藏
页码:720 / 729
页数:10
相关论文
共 50 条
  • [31] Humanities full-text
    Williams, H
    LIBRARY JOURNAL, 2003, 128 (05) : 124 - 124
  • [32] Efficient fuzzy full-text type-ahead search
    Li, Guoliang
    Ji, Shengyue
    Li, Chen
    Feng, Jianhua
    VLDB JOURNAL, 2011, 20 (04): : 617 - 640
  • [33] AN EVALUATION OF THE APPLICABILITY OF RANKING ALGORITHMS TO IMPROVE THE EFFECTIVENESS OF FULL-TEXT RETRIEVAL .2. ON THE EFFECTIVENESS OF RANKING ALGORITHMS ON FULL-TEXT RETRIEVAL
    RO, JS
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1988, 39 (03): : 147 - 160
  • [34] Proposal of a lightweight, offline, full-text search engine for an mHealth app
    Lopes, Carla Teixeira
    Azevedo, David
    Monteiro, Joao M.
    2022 17TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI), 2022,
  • [35] Enhancing HDFS with a full-text search system for massive small files
    Xu, Wentao
    Zhao, Xin
    Lao, Bin
    Nong, Ge
    JOURNAL OF SUPERCOMPUTING, 2021, 77 (07): : 7149 - 7170
  • [36] Enhancing HDFS with a full-text search system for massive small files
    Wentao Xu
    Xin Zhao
    Bin Lao
    Ge Nong
    The Journal of Supercomputing, 2021, 77 : 7149 - 7170
  • [37] Full-text search engine with suffix index for massive heterogeneous data
    Xu, Wentao
    Chen, Haoyu
    Huan, Yidong
    Hu, Xuedong
    Nong, Ge
    INFORMATION SYSTEMS, 2022, 104
  • [38] Full-text searching in Perl
    Kientzle, T
    DR DOBBS JOURNAL, 1999, 24 (01): : 34 - +
  • [39] SEARCHING FULL-TEXT DATABASES
    TENOPIR, C
    LIBRARY JOURNAL, 1988, 113 (08) : 60 - 61
  • [40] Fast and Exact Nearest Neighbor Search in Hamming Space on Full-Text Search Engines
    Mu, Cun
    Zhao, Jun
    Yang, Guang
    Yang, Binwei
    Yan, Zheng
    SIMILARITY SEARCH AND APPLICATIONS (SISAP 2019), 2019, 11807 : 49 - 56