Automated document content characterization for a multimedia document retrieval system

被引:0
|
作者
Koivusaari, M
Sauvola, J
Pietikainen, M
机构
关键词
document layout analysis; predictive coding; document database; retrieval; document content characterization; object-oriented database;
D O I
10.1117/12.290337
中图分类号
O43 [光学];
学科分类号
070207 ; 0803 ;
摘要
We propose a new approach to automate document image layout extraction for an object-oriented database feature population using rapid low level feature analysis, preclassification and predictive coding. The layout information comprised of region location and classification data is transformed into 'feature object(s)'. The information is then fed into an intelligent document image retrieval system (IDIR) to be utilized in document retrieval schemes. The IDIR system consists of user interface, object-oriented database and a variety of document image analysis algorithms. In this paper the object-oriented storage model and the database system are presented in formal and functional domains. Moreover, the graphical user interface and a visual document image browser are described. The document analysis techniques used at document characterization are also presented. In this context the documents consist of text, picture and other media (possibly embedded) data. Documents are stored in the database as document, page and region objects. Our test system has been implemented and tested using a document database of 10 000 documents.
引用
收藏
页码:148 / 159
页数:12
相关论文
共 50 条
  • [41] Investigating the document structure as a source of evidence for multimedia fragment retrieval
    Torjmen-Khemakhem, Mouna
    Pinel-Sauvagnat, Karen
    Boughanem, Mohand
    INFORMATION PROCESSING & MANAGEMENT, 2013, 49 (06) : 1281 - 1300
  • [42] Understanding multimedia document semantics for cross-media retrieval
    Wu, F
    Yang, Y
    Zhuang, YT
    Pan, YH
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2005, PT 1, 2005, 3767 : 993 - 1004
  • [43] ELECTRONIC DOCUMENT-RETRIEVAL SYSTEM
    CLAYDON, BA
    BRITISH TELECOMMUNICATIONS ENGINEERING, 1991, 10 : 260 - 263
  • [44] THE IMPLEMENTATION OF A DOCUMENT-RETRIEVAL SYSTEM
    CROFT, WB
    RUGGLES, L
    LECTURE NOTES IN COMPUTER SCIENCE, 1983, 146 : 28 - 37
  • [45] A MECHANIZED INFORMATION AND DOCUMENT RETRIEVAL SYSTEM
    BATCHELOR, HW
    MALONEY, CJ
    JOURNAL OF CHEMICAL DOCUMENTATION, 1964, 4 (03): : 181 - 185
  • [46] A Highly Associative Document Retrieval System
    Cagan, Carl
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1970, 21 (05): : 330 - 337
  • [47] GAF DOCUMENT STORAGE AND RETRIEVAL SYSTEM
    STARKE, AC
    WHALEY, FR
    CARSON, EC
    THOMPSON, WB
    AMERICAN DOCUMENTATION, 1968, 19 (02): : 173 - &
  • [48] Forensic handwritten document retrieval system
    Srihari, SN
    Shi, ZX
    FIRST INTERNATIONAL WORKSHOP ON DOCUMENT IMAGE ANALYSIS FOR LIBRARIES, PROCEEDINGS, 2004, : 188 - 194
  • [49] Manual and automated document retrieval at the NSF Web site
    Wink, DJ
    JOURNAL OF CHEMICAL EDUCATION, 1998, 75 (05) : 535 - 535
  • [50] Semi-Automated Document Image Clustering and Retrieval
    Diem, Markus
    Kleber, Florian
    Fiel, Stefan
    Sablatnig, Robert
    DOCUMENT RECOGNITION AND RETRIEVAL XXI, 2014, 9021