A method to improve full-text search performance of MongoDB

被引:1
|
作者
Mesut, Altan [1 ]
Ozturk, Emir [1 ]
机构
[1] Trakya Univ, Engn Fac, Dept Comp Engn, Edirne, Turkey
关键词
NoSQL; MongoDB; Text index; Full-Text search; MWCA;
D O I
10.5505/pajes.2021.89590
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
B-Tree based text indexes used in MongoDB are slow compared to different structures such as inverted indexes. In this study, it has been shown that the full-text search speed can be increased significantly by indexing a structure in which each different word in the text is included only once. The Multi-Stream Word-Based Compression Algorithm (MWCA), developed in our previous work, stores word dictionaries and data in different streams. While adding the documents to a MongoDB collection, they were encoded with MWCA and separated into six different streams. Each stream was stored in a different field, and three of them containing unique words were used when creating a text index. In this way, the index could be created in a shorter time and took up less space. It was also seen that Snappy and Zlib block compression methods used by MongoDB reached higher compression ratios on data encoded with MWCA. Search tests on text indexes created on collections using different compression options shows that our method provides 19 to 146 times speed increase and 34% to 40% less memory usage. Tests on regex searches that do not use the text index also shows that the MWCA model provides 7 to 13 times speed increase and 29% to 34% less memory usage.
引用
收藏
页码:720 / 729
页数:10
相关论文
共 50 条
  • [21] A Fast Appearance-Based Full-Text Search Method for Historical Newspaper Images
    Terasawa, Kengo
    Shima, Takahiro
    Kawashima, Toshio
    11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 1379 - 1383
  • [22] Full-text searching
    Olson, MA
    DR DOBBS JOURNAL, 1999, 24 (05): : 10 - 10
  • [23] THE FULL-TEXT IDEAL
    MARCUS, J
    DATABASE-THE MAGAZINE OF ELECTRONIC DATABASE REVIEWS, 1995, 18 (06): : 83 - 85
  • [24] Big Data Full-Text Search Index Minimization Using Text Summarization
    Iqbal, Waheed
    Malik, Waqas Ilyas
    Bukhari, Faisal
    Almustafa, Khaled Mohamad
    Nawaz, Zubiar
    INFORMATION TECHNOLOGY AND CONTROL, 2021, 50 (02): : 375 - 389
  • [25] ChemDB update - full-text search and virtual chemical space
    Chen, Jonathan H.
    Linstead, Erik
    Swamidass, S. Joshua
    Wang, Dennis
    Baldi, Pierre
    BIOINFORMATICS, 2007, 23 (17) : 2348 - 2351
  • [26] FULL-TEXT DATABASES
    SIDDIQUI, MA
    ONLINE REVIEW, 1991, 15 (06): : 367 - 372
  • [27] Full-text Search for Verifiable Credential Metadata on Distributed Ledgers
    Lux, Zoltan Andras
    Beierle, Felix
    Zickau, Sebastian
    Goendoer, Sebastian
    2019 SIXTH INTERNATIONAL CONFERENCE ON INTERNET OF THINGS: SYSTEMS, MANAGEMENT AND SECURITY (IOTSMS), 2019, : 519 - 528
  • [28] Efficient fuzzy full-text type-ahead search
    Guoliang Li
    Shengyue Ji
    Chen Li
    Jianhua Feng
    The VLDB Journal, 2011, 20 : 617 - 640
  • [29] RepoVis: Visual Overviews and Full-Text Search in Software Repositories
    Feiner, Johannes
    Andrews, Keith
    2018 SIXTH IEEE WORKING CONFERENCE ON SOFTWARE VISUALIZATION (VISSOFT), 2018, : 1 - 11
  • [30] FULL-TEXT DATABASES
    TENOPIR, C
    ANNUAL REVIEW OF INFORMATION SCIENCE AND TECHNOLOGY, 1984, 19 : 215 - 246