Automated document content characterization for a multimedia document retrieval system

被引:0
|
作者
Koivusaari, M
Sauvola, J
Pietikainen, M
机构
关键词
document layout analysis; predictive coding; document database; retrieval; document content characterization; object-oriented database;
D O I
10.1117/12.290337
中图分类号
O43 [光学];
学科分类号
070207 ; 0803 ;
摘要
We propose a new approach to automate document image layout extraction for an object-oriented database feature population using rapid low level feature analysis, preclassification and predictive coding. The layout information comprised of region location and classification data is transformed into 'feature object(s)'. The information is then fed into an intelligent document image retrieval system (IDIR) to be utilized in document retrieval schemes. The IDIR system consists of user interface, object-oriented database and a variety of document image analysis algorithms. In this paper the object-oriented storage model and the database system are presented in formal and functional domains. Moreover, the graphical user interface and a visual document image browser are described. The document analysis techniques used at document characterization are also presented. In this context the documents consist of text, picture and other media (possibly embedded) data. Documents are stored in the database as document, page and region objects. Our test system has been implemented and tested using a document database of 10 000 documents.
引用
收藏
页码:148 / 159
页数:12
相关论文
共 50 条
  • [31] Document image retrieval in a question answering system for document images
    Kise, K
    Fukushima, S
    Matsumoto, K
    DOCUMENT ANALYSIS SYSTEMS VI, PROCEEDINGS, 2004, 3163 : 521 - 532
  • [32] Document retrieval system - Tolerant of segmentation errors of document images
    Nagasaki, T
    Takahashi, T
    Marukawa, K
    NINTH INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION, PROCEEDINGS, 2004, : 280 - 285
  • [33] Document retrieval system tolerant of segmentation errors of document images
    Nagasaki, T. (naga-t@crl.hitachi.co.jp), Hitachi, Japan; IBM, USA; Fujitsu Laboratories, Japan; NEC, Japan; Toshiba, Japan (IEEE Computer Society):
  • [34] Design and implementation of a multimedia document system
    Liu, Longda
    Chen, Qiquan
    Chen, Weibin
    Wang, Jinlong
    Huaqiao Daxue Xuebao/Journal of Huaqiao University, 20 (02): : 200 - 203
  • [35] CBDIR: Fast and Effective Content Based Document Information Retrieval System
    Cha, Moon Soo
    Kim, So Yeon
    Ha, Jae Hee
    Lee, Min-June
    Choi, Young-June
    Sohn, Kyung-Ah
    2015 IEEE/ACIS 14TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2015, : 203 - 208
  • [36] Automated Document Analysis System
    Black, JD
    Dietzel, R
    Hartnett, D
    SENSORS, AND COMMAND, CONTROL, COMMUNICATIONS AND INTELLIGENCE (C31) TECHNOLOGIES FOR HOMELAND DEFENSE AND LAW ENFORCEMENT, 2002, 4708 : 90 - 98
  • [37] An Intelligent System for Automated Binary Knowledge Document Classification and Content Analysis
    Chiang, Tzu-An
    Wu, Chun-Yi
    Trappey, Charles V.
    Trappey, Amy J. C.
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2011, 17 (14) : 1991 - 2008
  • [38] Structured Document Retrieval, Multimedia Retrieval, and Entity Ranking Using PF/Tijah
    Tsikrika, Theodora
    Serdyukov, Pavel
    Rode, Henning
    Westerveld, Thijs
    Aly, Robin
    Hiemstra, Djoerd
    de Vries, Arjen P.
    FOCUSED ACCESS TO XML DOCUMENTS, 2008, 4862 : 306 - +
  • [39] Document level assessment of document retrieval systems in a pairwise system evaluation
    Rajagopal, Prabha
    Ravana, Sri Devi
    INFORMATION RESEARCH-AN INTERNATIONAL ELECTRONIC JOURNAL, 2017, 22 (02):
  • [40] An Information Theoretic Similarity Measure for Unified Multimedia Document Retrieval
    Pushpalatha, K.
    Ananthanarayana, V. S.
    2014 7TH INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION FOR SUSTAINABILITY (ICIAFS), 2014,