Extracting Relevant Named Entities for Automated Expense Reimbursement

被引：0

作者：

Zhu, Guangyu ^{[1
]}

Bethea, Timothy J. ^{[1
]}

Krishna, Vikas ^{[1
]}

机构：

[1] Univ Maryland, Inst Adv Comp Studies, College Pk, MD 20742 USA

来源：

KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING | 2007年

关键词：

Named entity extraction; learning; document layout analysis; conditional random fields;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Expense reimbursement is a time-consuming and labor-intensive process across organizations. In this paper, we present a prototype expense reimbursement system that dramatically reduces the elapsed time and costs involved, by eliminating paper from the process life cycle. Our complete solution involves (1) an electronic submission infrastructure that provides multi-channel image capture, secure transport and centralized storage of paper documents; (2) an unconstrained data mining approach to extracting relevant named entities from un-structured document images; (3) automation of auditing procedures that enables automatic expense validation with minimum human interaction. Extracting relevant named entities robustly from document images with unconstrained layouts and diverse formatting is a fundamental technical challenge to image-based data mining, question answering, and other information retrieval tasks. In many applications that require such capability, applying traditional language modeling techniques to the stream of OCR text does not give satisfactory result due to the absence of linguistic context. We present an approach for extracting relevant named entities from document images by combining rich page layout features in the image space with language content in the OCR text using a discriminative conditional random field (CRY) framework. We integrate this named entity extraction engine into our expense reimbursement solution and evaluate the system performance on large collections of real-world receipt images provided by IBM World Wide Reimbursement Center.

引用

页码：1004 / 1012

页数：9

共 50 条

[31] Handling conjunctions in named entities
Mazur, Pawel
Dale, Robert
LINGUISTICAE INVESTIGATIONES, 2007, 30 (01): : 49 - 68
[32] Cluster analysis of named entities
Kozareva, Z
Silva, J
Gamallo, P
Lopes, G
INTELLIGENT INFORMATION PROCESSING AND WEB MINING, 2004, : 429 - 433
[33] Indexing concepts and/or named entities
Buizza, Pino
JLIS.IT, 2011, 2 (02):
[34] Processing Named Entities in Text
McNamee, Paul
Mayfield, James C.
Piatko, Christine D.
JOHNS HOPKINS APL TECHNICAL DIGEST, 2011, 30 (01): : 31 - 40
[35] Identifying Named Entities as they are Typed
Arora, Ravneet Singh
Tsai, Chen-Tse
Preotiuc-Pietro, Daniel
16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 976 - 988
[36] Release of Medical records of Patients with expense Reimbursement
Rieger, H. -J.
DEUTSCHE MEDIZINISCHE WOCHENSCHRIFT, 2011, 136 (34-35) : 1744 - 1745
[37] Integrating Bilingual Named Entities Lexicon with Conditional Random Fields Model for Arabic Named Entities Recognition
Hkiri, Emna
Mallati, Souheyl
Zrigui, Mounir
2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 609 - 614
[38] Disambiguating named entities by semantic web
Azari, Ideh
Koohpeyma, Fateme
PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND SERVICE SYSTEM (CSSS), 2014, 109 : 741 - 744
[39] Community relation discovery by named entities
Zhu, Jian-Han
Goncalves, Alexandre L.
Uren, Victoria S.
Motta, Enrico
Pacheco, Roberto
Song, Da-Wei
Rueger, Stefan
PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 1966 - +
[40] A system for recognition of named entities in Greek
Boutsis, S
Demiros, I
Giouli, V
Liakata, M
Papageorgiou, H
Piperidis, S
NATURAL LANGUAGE PROCESSING-NLP 2000, PROCEEDINGS, 2000, 1835 : 424 - 435

← 1 2 3 4 5 →