Figure search by text in large scale digital document collections

被引:3
|
作者
Yurtsever, M. Mucahit Enes [1 ]
Ozcan, Muhammet [2 ]
Taruz, Zubeyir [2 ]
Eken, Suleyman [1 ]
Sayar, Ahmet [2 ]
机构
[1] Kocaeli Univ, Dept Informat Syst Engn, Umuttepe Campus, TR-41001 Kocaeli, Turkey
[2] Kocaeli Univ, Dept Comp Engn, Kocaeli, Turkey
来源
关键词
Apache Solr; document digitization; Elasticsearch; figure search; full-text search; regular expressions; RETRIEVAL;
D O I
10.1002/cpe.6529
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Digital document collections have been created with the transfer of a large number of documents to digital media. These digital archives have provided many benefits to users. As the diversity and size of digital image collections have grown exponentially, it has become increasingly important and difficult to obtain the desired image from them. The images on the document might contain critical information about the subject of it. In this study, an architecture is developed that can work on large-scale data by creating regular expressions together with full-text search approaches. The performance of the system has been tested on different academic documents and Elasticsearch and Apache Solr insert times are compared. Compared to Elasticsearch, Apache Solr achieved faster and more successful results.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Managing very large document collections using semantics
    GuoRen Wang
    HongJun Lu
    Ge Yu
    Bin YuBao
    Journal of Computer Science and Technology, 2003, 18 : 403 - 406
  • [42] Context grabbing: Assigning metadata in large document collections
    Hinrichs, J
    Pipek, V
    Wulf, V
    ECSCW 2005: PROCEEDINGS OF THE NINTH EUROPEAN CONFERENCE ON COMPUTER-SUPPORTED COOPERATIVE WORK, 2005, : 367 - 386
  • [43] Interactive visualization for opportunistic exploration of large document collections
    Lehmann, Simon
    Schwanecke, Ulrich
    Doerner, Ralf
    INFORMATION SYSTEMS, 2010, 35 (02) : 260 - 269
  • [44] Spotting relevant information in extremely large document collections
    Kohonen, T
    COMPUTATIONAL INTELLIGENCE: THEORY AND APPLICATIONS, 1999, 1625 : 59 - 61
  • [45] A method for calculating term similarity on large document collections
    Bein, WW
    Coombs, JS
    Taghva, K
    ITCC 2003: INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: COMPUTERS AND COMMUNICATIONS, PROCEEDINGS, 2003, : 199 - 203
  • [46] Managing very large document collections using semantics
    Wang, GR
    Lu, HJ
    Yu, G
    Bao, YB
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2003, 18 (03) : 403 - 406
  • [47] Selective Search: Efficient and Effective Search of Large Textual Collections
    Kulkarni, Anagha
    Callan, Jamie
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2015, 33 (04)
  • [48] ThemeRiver: Visualizing thematic changes in large document collections
    Havre, S
    Hetzler, E
    Whitney, P
    Nowell, L
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2002, 8 (01) : 9 - 20
  • [49] TopicLens: Efficient Multi-Level Visual Topic Exploration of Large-Scale Document Collections
    Kim, Minjeong
    Kang, Kyeongpil
    Park, Deokgun
    Choo, Jaegul
    Elmqvist, Niklas
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2017, 23 (01) : 151 - 160
  • [50] Generating hierarchical document indices from common denominators in large document collections
    OKane, KC
    INFORMATION PROCESSING & MANAGEMENT, 1996, 32 (01) : 105 - 115