Extracting Relevant Named Entities for Automated Expense Reimbursement

被引:0
|
作者
Zhu, Guangyu [1 ]
Bethea, Timothy J. [1 ]
Krishna, Vikas [1 ]
机构
[1] Univ Maryland, Inst Adv Comp Studies, College Pk, MD 20742 USA
关键词
Named entity extraction; learning; document layout analysis; conditional random fields;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Expense reimbursement is a time-consuming and labor-intensive process across organizations. In this paper, we present a prototype expense reimbursement system that dramatically reduces the elapsed time and costs involved, by eliminating paper from the process life cycle. Our complete solution involves (1) an electronic submission infrastructure that provides multi-channel image capture, secure transport and centralized storage of paper documents; (2) an unconstrained data mining approach to extracting relevant named entities from un-structured document images; (3) automation of auditing procedures that enables automatic expense validation with minimum human interaction. Extracting relevant named entities robustly from document images with unconstrained layouts and diverse formatting is a fundamental technical challenge to image-based data mining, question answering, and other information retrieval tasks. In many applications that require such capability, applying traditional language modeling techniques to the stream of OCR text does not give satisfactory result due to the absence of linguistic context. We present an approach for extracting relevant named entities from document images by combining rich page layout features in the image space with language content in the OCR text using a discriminative conditional random field (CRY) framework. We integrate this named entity extraction engine into our expense reimbursement solution and evaluate the system performance on large collections of real-world receipt images provided by IBM World Wide Reimbursement Center.
引用
收藏
页码:1004 / 1012
页数:9
相关论文
共 50 条
  • [21] Co-occurrence based word representation for extracting named entities in Tamil tweets
    Devi, G. Remmiya
    Kumar, M. Anand
    Soman, K. P.
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 34 (03) : 1435 - 1442
  • [22] Extracting Complex Named Entities in Legal Documents via Weakly Supervised Object Detection
    Yang, Hsiu-Wei
    Agrawal, Abhinav
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 3349 - 3353
  • [23] Separating Named Entities
    Ulipova, Barbora
    Grac, Marek
    RASLAN 2014: RECENT ADVANCES IN SLAVONIC NATURAL LANGUAGE PROCESSING, 2014, : 91 - 96
  • [24] Extracting Domain-Specific Chinese Named Entities for Aviation Safety Reports: A Case Study
    Wang, Xin
    Gan, Zurui
    Xu, Yaxi
    Liu, Bingnan
    Zheng, Tao
    APPLIED SCIENCES-BASEL, 2023, 13 (19):
  • [25] Extracting Named Entities from Russian-Language Documents with Varying Degrees of Structural Clarity
    Averina, M. D.
    Levanova, O. A.
    AUTOMATIC CONTROL AND COMPUTER SCIENCES, 2024, 58 (07) : 969 - 976
  • [26] Automated recognition of geographical named entities in titles of Ukiyo-e prints
    Chatzipanagiotou, Marita
    Machotka, Ewa
    Pavlopoulos, John
    PROCEEDINGS OF DIGITAL HUMANITIES WORKSHOP (DHW 2021), 2021, : 70 - 77
  • [27] Using CRF plus LG for automated classification of named entities in newspaper texts
    Lima, Jaimel de Oliveira
    Colombo, Cristiano da Silveira
    Izo, Flavio
    Pinheiro Pirovani, Juliana Campos
    de Oliveira, Elias
    2020 XLVI LATIN AMERICAN COMPUTING CONFERENCE (CLEI 2020), 2021, : 27 - 32
  • [28] COPIOUS: A gold standard corpus of named entities towards extracting species occurrence from biodiversity literature
    Nguyen, Nhung T. H.
    Gabud, Roselyn S.
    Ananiadou, Sophia
    BIODIVERSITY DATA JOURNAL, 2019, 7
  • [29] Handling conjunctions in named entities
    Dale, Robert
    Mazur, Pawel
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2007, 4394 : 131 - +
  • [30] Named Entities for Computational Linguistics
    Golikova, Daria M.
    VOPROSY ONOMASTIKI-PROBLEMS OF ONOMASTICS, 2018, 15 (01): : 207 - 215