Extracting Relevant Named Entities for Automated Expense Reimbursement

被引:0
|
作者
Zhu, Guangyu [1 ]
Bethea, Timothy J. [1 ]
Krishna, Vikas [1 ]
机构
[1] Univ Maryland, Inst Adv Comp Studies, College Pk, MD 20742 USA
关键词
Named entity extraction; learning; document layout analysis; conditional random fields;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Expense reimbursement is a time-consuming and labor-intensive process across organizations. In this paper, we present a prototype expense reimbursement system that dramatically reduces the elapsed time and costs involved, by eliminating paper from the process life cycle. Our complete solution involves (1) an electronic submission infrastructure that provides multi-channel image capture, secure transport and centralized storage of paper documents; (2) an unconstrained data mining approach to extracting relevant named entities from un-structured document images; (3) automation of auditing procedures that enables automatic expense validation with minimum human interaction. Extracting relevant named entities robustly from document images with unconstrained layouts and diverse formatting is a fundamental technical challenge to image-based data mining, question answering, and other information retrieval tasks. In many applications that require such capability, applying traditional language modeling techniques to the stream of OCR text does not give satisfactory result due to the absence of linguistic context. We present an approach for extracting relevant named entities from document images by combining rich page layout features in the image space with language content in the OCR text using a discriminative conditional random field (CRY) framework. We integrate this named entity extraction engine into our expense reimbursement solution and evaluate the system performance on large collections of real-world receipt images provided by IBM World Wide Reimbursement Center.
引用
收藏
页码:1004 / 1012
页数:9
相关论文
共 50 条
  • [1] Extracting Relations between Arabic Named Entities
    Alotayq, Abdullah
    TEXT, SPEECH, AND DIALOGUE, TSD 2013, 2013, 8082 : 265 - 271
  • [2] Extracting Named Entities and Synonyms from Wikipedia
    Bohn, Christian
    Norvag, Kjetil
    2010 24TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS (AINA), 2010, : 1300 - 1307
  • [3] Extracting named entities using support vector machines
    Wu, Yu-Chieh
    Fan, Teng-Kai
    Lee, Yue-Shi
    Yen, Show-Jane
    KNOWLEDGE DISCOVERY IN LIFE SCIENCE LITERATURE, PROCEEDINGS, 2006, 3886 : 91 - 103
  • [4] SciNER: Extracting Named Entities from Scientific Literature
    Hong, Zhi
    Tchoua, Roselyne
    Chard, Kyle
    Foster, Ian
    COMPUTATIONAL SCIENCE - ICCS 2020, PT II, 2020, 12138 : 308 - 321
  • [5] UNSUPERVISED KNOWLEDGE ACQUISITION FOR EXTRACTING NAMED ENTITIES FROM SPEECH
    Bechet, Frederic
    Charton, Eric
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5338 - 5341
  • [6] Combined Classification for Extracting Named Entities from Arabic Texts
    Trabelsi, Feriel Ben Fraj
    Zribi, Chiraz Ben Othmane
    Kouki, Wiem
    2015 FIRST INTERNATIONAL CONFERENCE ON ARABIC COMPUTATIONAL LINGUISTICS (ACLING 2015): ADVANCES IN ARABIC COMPUTATIONAL LINGUISTICS, 2015, : 55 - 60
  • [7] Extracting Named Entities from Prophetic Narration Texts (Hadith)
    Harrag, Fouzi
    El-Qawasmeh, Eyas
    Al-Salman, Abdul Malik Salman
    SOFTWARE ENGINEERING AND COMPUTER SYSTEMS, PT 2, 2011, 180 : 289 - +
  • [8] A hybrid method for extracting relations between Arabic named entities
    Boujelben, Ines
    Jamoussi, Salma
    Ben Hamadou, Abdelmajid
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2014, 26 (04) : 425 - 440
  • [9] ASRextractor: A Tool extracting Semantic Relations between Arabic Named Entities
    Ben Mesmia, Fatma
    Zid, Fatma
    Haddar, Kais
    Maurel, Denis
    ARABIC COMPUTATIONAL LINGUISTICS (ACLING 2017), 2017, 117 : 55 - 62
  • [10] GPDminer: a tool for extracting named entities and analyzing relations in biological literature
    Park, Yeon-Ji
    Yang, Geun-Je
    Sohn, Chae-Bong
    Park, Soo Jun
    BMC BIOINFORMATICS, 2024, 25 (01)