Figure search by text in large scale digital document collections

被引:3
|
作者
Yurtsever, M. Mucahit Enes [1 ]
Ozcan, Muhammet [2 ]
Taruz, Zubeyir [2 ]
Eken, Suleyman [1 ]
Sayar, Ahmet [2 ]
机构
[1] Kocaeli Univ, Dept Informat Syst Engn, Umuttepe Campus, TR-41001 Kocaeli, Turkey
[2] Kocaeli Univ, Dept Comp Engn, Kocaeli, Turkey
来源
关键词
Apache Solr; document digitization; Elasticsearch; figure search; full-text search; regular expressions; RETRIEVAL;
D O I
10.1002/cpe.6529
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Digital document collections have been created with the transfer of a large number of documents to digital media. These digital archives have provided many benefits to users. As the diversity and size of digital image collections have grown exponentially, it has become increasingly important and difficult to obtain the desired image from them. The images on the document might contain critical information about the subject of it. In this study, an architecture is developed that can work on large-scale data by creating regular expressions together with full-text search approaches. The performance of the system has been tested on different academic documents and Elasticsearch and Apache Solr insert times are compared. Compared to Elasticsearch, Apache Solr achieved faster and more successful results.
引用
收藏
页数:11
相关论文
共 50 条
  • [11] Efficient search in document image collections
    Kumar, Anand
    Jawahar, C. V.
    Manmatha, R.
    COMPUTER VISION - ACCV 2007, PT I, PROCEEDINGS, 2007, 4843 : 586 - +
  • [12] Structured Search in Annotated Document Collections
    Gupta, Dhruv
    Berberich, Klaus
    PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19), 2019, : 794 - 797
  • [13] Image-Text Matching for Large-Scale Book Collections
    Llabres, Artemis
    Ujjal Dey, Arka
    Karatzas, Dimosthenis
    Valveny, Ernest
    DOCUMENT ANALYSIS SYSTEMS, DAS 2024, 2024, 14994 : 89 - 102
  • [14] Probabilistic Indexing for Information Search and Retrieval in Large Collections of Handwritten Text Images
    Wardhana, Arya Wijaya Pramodha
    Toselli, Alejandro Hector
    Puigcerver, Joan
    Vidal, Enrique
    JOURNAL OF LIBRARIANSHIP AND INFORMATION SCIENCE, 2025,
  • [15] Multimodal search in collections of images and text
    Santini, S
    JOURNAL OF ELECTRONIC IMAGING, 2002, 11 (04) : 455 - 468
  • [16] Efficient Search and Browsing of Large-Scale Video Collections with Vibro
    Hezel, Nico
    Schall, Konstantin
    Jung, Klaus
    Barthel, Kai Uwe
    MULTIMEDIA MODELING, MMM 2022, PT II, 2022, 13142 : 487 - 492
  • [17] A Comparison of Search Functionalities in Several Tools Used for Searching within Digital Text Collections
    Ball, Liezl H.
    Bothma, Theo J.D.
    Proceedings of the Association for Information Science and Technology, 2021, 58 (01): : 679 - 681
  • [18] Search and Navigation in Semantically Integrated Document Collections
    Nesic, Sasa
    Crestani, Fabio
    Jazayeri, Mehdi
    Gasevic, Dragan
    SEMAPRO 2010: THE FOURTH INTERNATIONAL CONFERENCE ON ADVANCES IN SEMANTIC PROCESSING, 2010, : 55 - 60
  • [19] Facilitating Understanding of Large Document Collections
    Bae, Jae Hyeon
    Xu, Weijia
    Esteva, Maria
    11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 1334 - 1338
  • [20] Fast categorisation of large document collections
    Shanks, V
    Williams, HE
    EIGHTH SYMPOSIUM ON STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2001, : 194 - 204