A method to improve full-text search performance of MongoDB

被引:1
|
作者
Mesut, Altan [1 ]
Ozturk, Emir [1 ]
机构
[1] Trakya Univ, Engn Fac, Dept Comp Engn, Edirne, Turkey
关键词
NoSQL; MongoDB; Text index; Full-Text search; MWCA;
D O I
10.5505/pajes.2021.89590
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
B-Tree based text indexes used in MongoDB are slow compared to different structures such as inverted indexes. In this study, it has been shown that the full-text search speed can be increased significantly by indexing a structure in which each different word in the text is included only once. The Multi-Stream Word-Based Compression Algorithm (MWCA), developed in our previous work, stores word dictionaries and data in different streams. While adding the documents to a MongoDB collection, they were encoded with MWCA and separated into six different streams. Each stream was stored in a different field, and three of them containing unique words were used when creating a text index. In this way, the index could be created in a shorter time and took up less space. It was also seen that Snappy and Zlib block compression methods used by MongoDB reached higher compression ratios on data encoded with MWCA. Search tests on text indexes created on collections using different compression options shows that our method provides 19 to 146 times speed increase and 34% to 40% less memory usage. Tests on regex searches that do not use the text index also shows that the MWCA model provides 7 to 13 times speed increase and 29% to 34% less memory usage.
引用
收藏
页码:720 / 729
页数:10
相关论文
共 50 条
  • [1] One approach for full-text search of files in MongoDB based systems
    Kelec, Aleksandar
    Dujlovic, Igor
    Obradovic, Nikola
    2019 18TH INTERNATIONAL SYMPOSIUM INFOTEH-JAHORINA (INFOTEH), 2019,
  • [2] Expressiveness and performance of full-text search languages
    Botev, Chavdar
    Amer-Yahia, Sihem
    Shanmugasundaram, Jayavel
    ADVANCES IN DATABASE TECHNOLOGY - EDBT 2006, 2006, 3896 : 349 - 367
  • [3] IMPROVING FULL-TEXT SEARCH PERFORMANCE THROUGH TEXTUAL ANALYSIS
    MOLTO, M
    INFORMATION PROCESSING & MANAGEMENT, 1993, 29 (05) : 615 - 632
  • [4] Semantic Full-text Search with Broccoli
    Bast, Hannah
    Baurle, Florian
    Buchhold, Bjoern
    Haussmann, Elmar
    SIGIR'14: PROCEEDINGS OF THE 37TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2014, : 1265 - 1266
  • [5] Improving Bilingual Search Performance Using Compact Full-Text Indices
    Costa, Jorge
    Gomes, Luis
    Lopes, Gabriel P.
    Russo, Luis M. S.
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT I, 2015, 9041 : 582 - 595
  • [6] TRMeister: a DBMS with high-performance full-text search functions
    Ikeda, T
    Mano, H
    Itoh, H
    Takegawa, H
    Hiraoka, T
    Horibe, S
    Ogawa, Y
    ICDE 2005: 21ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2005, : 958 - 967
  • [7] Full-text Search Using Database Index
    Chaitanya, B. Sri Sai Krishna
    Reddy, D. Ajay Kumar
    Chandra, B. Pavan Sai Eshwar
    Krishna, A. Bala
    Menon, Remya R. K.
    2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2019,
  • [8] Preparing heterogeneous XML for full-text search
    Lehtonen, Miro
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2006, 24 (04) : 455 - 474
  • [9] An Index for Efficient Semantic Full-Text Search
    Bast, Hannah
    Buchhold, Bjoern
    PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 369 - 378
  • [10] Full-Text Search Engine using MySQL
    Gyorodi, C.
    Gyorodi, R.
    Pecherle, G.
    Cornea, G. M.
    INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2010, 5 (05) : 735 - 743