Extracting Relevant Named Entities for Automated Expense Reimbursement

被引:0
|
作者
Zhu, Guangyu [1 ]
Bethea, Timothy J. [1 ]
Krishna, Vikas [1 ]
机构
[1] Univ Maryland, Inst Adv Comp Studies, College Pk, MD 20742 USA
关键词
Named entity extraction; learning; document layout analysis; conditional random fields;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Expense reimbursement is a time-consuming and labor-intensive process across organizations. In this paper, we present a prototype expense reimbursement system that dramatically reduces the elapsed time and costs involved, by eliminating paper from the process life cycle. Our complete solution involves (1) an electronic submission infrastructure that provides multi-channel image capture, secure transport and centralized storage of paper documents; (2) an unconstrained data mining approach to extracting relevant named entities from un-structured document images; (3) automation of auditing procedures that enables automatic expense validation with minimum human interaction. Extracting relevant named entities robustly from document images with unconstrained layouts and diverse formatting is a fundamental technical challenge to image-based data mining, question answering, and other information retrieval tasks. In many applications that require such capability, applying traditional language modeling techniques to the stream of OCR text does not give satisfactory result due to the absence of linguistic context. We present an approach for extracting relevant named entities from document images by combining rich page layout features in the image space with language content in the OCR text using a discriminative conditional random field (CRY) framework. We integrate this named entity extraction engine into our expense reimbursement solution and evaluate the system performance on large collections of real-world receipt images provided by IBM World Wide Reimbursement Center.
引用
收藏
页码:1004 / 1012
页数:9
相关论文
共 50 条
  • [31] Handling conjunctions in named entities
    Mazur, Pawel
    Dale, Robert
    LINGUISTICAE INVESTIGATIONES, 2007, 30 (01): : 49 - 68
  • [32] Cluster analysis of named entities
    Kozareva, Z
    Silva, J
    Gamallo, P
    Lopes, G
    INTELLIGENT INFORMATION PROCESSING AND WEB MINING, 2004, : 429 - 433
  • [33] Indexing concepts and/or named entities
    Buizza, Pino
    JLIS.IT, 2011, 2 (02):
  • [34] Processing Named Entities in Text
    McNamee, Paul
    Mayfield, James C.
    Piatko, Christine D.
    JOHNS HOPKINS APL TECHNICAL DIGEST, 2011, 30 (01): : 31 - 40
  • [35] Identifying Named Entities as they are Typed
    Arora, Ravneet Singh
    Tsai, Chen-Tse
    Preotiuc-Pietro, Daniel
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 976 - 988
  • [36] Release of Medical records of Patients with expense Reimbursement
    Rieger, H. -J.
    DEUTSCHE MEDIZINISCHE WOCHENSCHRIFT, 2011, 136 (34-35) : 1744 - 1745
  • [37] Integrating Bilingual Named Entities Lexicon with Conditional Random Fields Model for Arabic Named Entities Recognition
    Hkiri, Emna
    Mallati, Souheyl
    Zrigui, Mounir
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 609 - 614
  • [38] Disambiguating named entities by semantic web
    Azari, Ideh
    Koohpeyma, Fateme
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND SERVICE SYSTEM (CSSS), 2014, 109 : 741 - 744
  • [39] Community relation discovery by named entities
    Zhu, Jian-Han
    Goncalves, Alexandre L.
    Uren, Victoria S.
    Motta, Enrico
    Pacheco, Roberto
    Song, Da-Wei
    Rueger, Stefan
    PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 1966 - +
  • [40] A system for recognition of named entities in Greek
    Boutsis, S
    Demiros, I
    Giouli, V
    Liakata, M
    Papageorgiou, H
    Piperidis, S
    NATURAL LANGUAGE PROCESSING-NLP 2000, PROCEEDINGS, 2000, 1835 : 424 - 435