Recognition techniques for extracting information from semi-structured documents

被引:0
|
作者
Della Ventura, A [1 ]
Gagliardi, I [1 ]
Zonta, B [1 ]
机构
[1] CNR, ITIM, I-20131 Milan, Italy
来源
关键词
OCR; automatic indexing; information retrieval; document analysis; image analysis; pattern matching; linguistic analysis;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Archives of optical documents are more and more massively employed, the demand driven also by the new norms sanctioning the legal value of digital documents, provided they are stored on supports that are physically unalterable. On the supply side there is now a vast and technologically advanced market, where optical memories have solved the problem of the duration and permanence of data at costs comparable to those for magnetic memories. The remaining bottleneck in these systems is the indexing. The indexing of documents with a variable structure, while still not completely automated, can be machine supported to a large degree with evident advantages both in the organization of the work, and in extracting information, providing data that is much more detailed and potentially significant for the user. We present here a system for the automatic registration of correspondence to and from a public office. The system is based on a general methodology for the extraction, indexing, archiving, and retrieval of significant information from semi-structured documents. This information, in our prototype application, is distributed among the database fields of sender, addressee, subject, date, and body of the document.
引用
收藏
页码:130 / 137
页数:8
相关论文
共 50 条
  • [21] Partial retrieval of compressed semi-structured documents
    Gupta, Ashutosh
    Agarwal, Suneeta
    INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2010, 38 (04) : 239 - 249
  • [22] Towards the automated verification of semi-structured documents
    Weitl, Franz
    Jaksic, Mirjana
    Freitag, Burkhard
    DATA & KNOWLEDGE ENGINEERING, 2009, 68 (03) : 292 - 317
  • [23] Semi-structured documents mining: a review and comparison
    Madani, Amina
    Boussaid, Omar
    Zegour, Djamel Eddine
    17TH INTERNATIONAL CONFERENCE IN KNOWLEDGE BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS - KES2013, 2013, 22 : 330 - 339
  • [24] Consideration of the Word's Neighborhood in GATs for Information Extraction in Semi-structured Documents
    Belhadj, Djedjiga
    Belaid, Yolande
    Belaid, Abdel
    DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II, 2021, 12822 : 854 - 869
  • [25] Extracting ontological relations of Korean numeral classifiers from semi-structured resources using NLP techniques
    Jung, Youngim
    Hwang, Soonhee
    Yoon, Aesun
    Kwon, Hyuk-Chul
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS 2006: OTM 2006 WORKSHOPS, PT 2, PROCEEDINGS, 2006, 4278 : 1038 - 1043
  • [26] A knowledge-based information extraction system for semi-structured labeled documents
    Yang, JY
    Oh, H
    Doh, KG
    Choi, J
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2002, 2002, 2412 : 105 - 110
  • [27] Extracting lists of data records from semi-structured web pages
    Alvarez, Manuel
    Pan, Alberto
    Raposo, Juan
    Bellas, Fernando
    Cacheda, Fidel
    DATA & KNOWLEDGE ENGINEERING, 2008, 64 (02) : 491 - 509
  • [28] Business information extraction from semi-structured webpages
    Sung, NH
    Chang, YS
    EXPERT SYSTEMS WITH APPLICATIONS, 2004, 26 (04) : 575 - 582
  • [29] Extracting Knowledge Using Wikipedia Semi-structured Resources
    Firoozeh, Nazanin
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, NLDB 2016, 2016, 9612 : 249 - 257
  • [30] Transformation rules from semi-structured XML documents to database model
    Badr, Y
    Sayah, M
    Laforest, F
    Flory, A
    ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, PROCEEDINGS, 2001, : 181 - 184