Figure search by text in large scale digital document collections

被引:3
|
作者
Yurtsever, M. Mucahit Enes [1 ]
Ozcan, Muhammet [2 ]
Taruz, Zubeyir [2 ]
Eken, Suleyman [1 ]
Sayar, Ahmet [2 ]
机构
[1] Kocaeli Univ, Dept Informat Syst Engn, Umuttepe Campus, TR-41001 Kocaeli, Turkey
[2] Kocaeli Univ, Dept Comp Engn, Kocaeli, Turkey
来源
关键词
Apache Solr; document digitization; Elasticsearch; figure search; full-text search; regular expressions; RETRIEVAL;
D O I
10.1002/cpe.6529
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Digital document collections have been created with the transfer of a large number of documents to digital media. These digital archives have provided many benefits to users. As the diversity and size of digital image collections have grown exponentially, it has become increasingly important and difficult to obtain the desired image from them. The images on the document might contain critical information about the subject of it. In this study, an architecture is developed that can work on large-scale data by creating regular expressions together with full-text search approaches. The performance of the system has been tested on different academic documents and Elasticsearch and Apache Solr insert times are compared. Compared to Elasticsearch, Apache Solr achieved faster and more successful results.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Figure search by text in large scale digital document collections
    Yurtsever, M. Mücahit Enes
    Özcan, Muhammet
    Taruz, Zübeyir
    Eken, Süleyman
    Sayar, Ahmet
    Concurrency and Computation: Practice and Experience, 2022, 34 (01)
  • [2] Efficient Fuzzy Search in Large Text Collections
    Bast, Hannah
    Celikik, Marjan
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2013, 31 (02)
  • [3] Detecting short passages of similar text in large document collections
    Lyon, C
    Malcolm, J
    Dickerson, B
    PROCEEDINGS OF THE 2001 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, 2001, : 118 - 125
  • [4] RANKING LARGE DOCUMENT COLLECTIONS BY A STATE-SPACE SEARCH
    GORDON, MD
    INFORMATION PROCESSING & MANAGEMENT, 1991, 27 (01) : 27 - 41
  • [5] Entropy-based authorship search in large document collections
    Zhao, Ying
    Zobel, Justin
    ADVANCES IN INFORMATION RETRIEVAL, 2007, 4425 : 381 - +
  • [6] Feature Extraction for Large-Scale Text Collections
    Gallagher, Luke
    Mallia, Antonio
    Culpepper, J. Shane
    Suel, Torsten
    Cambazoglu, B. Barla
    CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 3015 - 3022
  • [7] Extending Full Text Search for Legal Document Collections Using Word Embeddings
    Landthaler, Joerg
    Waltl, Bernhard
    Holl, Patrick
    Matthes, Florian
    LEGAL KNOWLEDGE AND INFORMATION SYSTEMS, 2016, 294 : 73 - 82
  • [8] Design considerations for a large-scale image-based text search engine in historical manuscript collections
    Schomaker, Lambert
    IT-INFORMATION TECHNOLOGY, 2016, 58 (02): : 80 - 88
  • [9] Interactive search of adipocytes in large collections of digital cellular images
    Goode, Adam
    Chen, Mei
    Tarachandani, Anil
    Mummert, Lily
    Sukthankar, Rahul
    Helfrich, Casey
    Stefanni, Alice
    Fix, Limor
    Saltzman, Jeffrey
    Satyanarayanan, M.
    2007 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-5, 2007, : 695 - 698
  • [10] SwiftLink: Serendipitous Navigation Strategy for Large-scale Document Collections
    von Wyl, Marc
    Marchand-Maillet, Stephane
    2012 23RD INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA), 2012, : 83 - 87