Figure search by text in large scale digital document collections

被引:3
|
作者
Yurtsever, M. Mucahit Enes [1 ]
Ozcan, Muhammet [2 ]
Taruz, Zubeyir [2 ]
Eken, Suleyman [1 ]
Sayar, Ahmet [2 ]
机构
[1] Kocaeli Univ, Dept Informat Syst Engn, Umuttepe Campus, TR-41001 Kocaeli, Turkey
[2] Kocaeli Univ, Dept Comp Engn, Kocaeli, Turkey
来源
关键词
Apache Solr; document digitization; Elasticsearch; figure search; full-text search; regular expressions; RETRIEVAL;
D O I
10.1002/cpe.6529
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Digital document collections have been created with the transfer of a large number of documents to digital media. These digital archives have provided many benefits to users. As the diversity and size of digital image collections have grown exponentially, it has become increasingly important and difficult to obtain the desired image from them. The images on the document might contain critical information about the subject of it. In this study, an architecture is developed that can work on large-scale data by creating regular expressions together with full-text search approaches. The performance of the system has been tested on different academic documents and Elasticsearch and Apache Solr insert times are compared. Compared to Elasticsearch, Apache Solr achieved faster and more successful results.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] A fast text similarity measure for large document collections using multireference cosine and genetic algorithm
    Mohammadi, Hamid
    Khasteh, Seyed Hossein
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2020, 28 (02) : 999 - 1013
  • [22] The capability of search tools to retrieve words with specific properties from large text collections
    Ball, Liezl
    Bothma, Theo
    INFORMATION RESEARCH-AN INTERNATIONAL ELECTRONIC JOURNAL, 2020, 25 (04):
  • [23] Embedding Metadata in Large-Scale Legacy Digital Audio Collections
    Edge, Ryan
    ARCHIVING 2016: FINAL PROGRAM AND PROCEEDINGS, 2016, : 156 - 160
  • [24] Can document-genre metadata improve information access to large digital collections?
    Crowston, K
    Kwasnik, BH
    LIBRARY TRENDS, 2003, 52 (02) : 345 - 361
  • [25] Clustering of document collections to support interactive text exploration
    Nürnberger, A
    Klose, A
    Kruse, R
    Hartmann, G
    Richards, M
    EXPLORATORY DATA ANALYSIS IN EMPIRICAL RESEARCH, PROCEEDINGS, 2003, : 257 - 265
  • [26] Enabling search over large collections of Telugu document images - An automatic annotation based approach
    Sankar K, Pramod
    Jawahar, C. V.
    COMPUTER VISION, GRAPHICS AND IMAGE PROCESSING, PROCEEDINGS, 2006, 4338 : 837 - +
  • [27] A Digital Text Watermarking for Word Document
    Zhang, Shi-ru
    Meng, Xiao-chun
    Liu, Xin-fu
    Chen, Wen-Yuan
    INTERNATIONAL CONFERENCE MACHINERY, ELECTRONICS AND CONTROL SIMULATION, 2014, 614 : 347 - 351
  • [28] Feature selection for the classification of large document collections
    Brank, Janez
    Mladenic, Dunja
    Grobelnik, Marko
    Milic-Frayling, Natasa
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2008, 14 (10) : 1562 - 1596
  • [29] Efficient clustering of very large document collections
    Dhillon, IS
    Fan, J
    Guan, YQ
    DATA MINING FOR SCIENTIFIC AND ENGINEERING APPLICATIONS, 2001, 2 : 357 - 381
  • [30] An efficient clustering approach for large document collections
    Han, B
    Kang, LS
    Song, HZ
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2005, 3584 : 240 - 247