TIRec: Transformer-based Invoice Text Recognition

被引：0

作者：

Chen, Yanlan ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing, Peoples R China

来源：

2023 2ND ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING, CACML 2023 | 2023年

关键词：

Text recognition; Invoice; Convolutional Vision Transformer;

D O I：

10.1145/3590003.3590034

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A novel invoice text recognition model is proposed. In the past few years, researchers have explored text recognition methods with RNN-like structures to model semantic information. However, RNN-based approaches have some obvious drawbacks, such as the level-by-level decoding approach and the one-way serial transmission of semantic information, which greatly limit semantic information's effectiveness and computational efficiency. In contrast, invoice text has obvious contextual relationships due to its fixed text pattern, the text font in the invoice is more fixed and the complexity of the background is much lower than that of natural scenes. To further exploit these contextual relationships and adapt to the characteristics of invoice text, we propose a new text recognition framework inspired by Transformer [1]. Self-attention-based architectures, in particular Transformer, have been successful in natural language processing (NLP). It has demonstrated powerful semantic information modeling capabilities in NLP. Inspired by its success, we try to apply Transformer to invoice text recognition. Unlike the RNN-based approach, we reduce the parameters of the vision network used to extract image features, use the Convolutional Vision Transformer Attention module to capture the semantic information, and use the Transformer decoding module to decode all characters in parallel. We hope that this Transformer-based architecture can better model the semantic information in invoices while remaining lightweight. Meanwhile, we collected text images of more than 40,000 train invoices, VAT invoices, rolled invoices, and cab invoices. Experiments on the collected invoice text recognition dataset show that our approach outperforms previous methods in terms of accuracy and speed.

引用

页码：175 / 180

页数：6

共 50 条

[31] Automatic text summarization using transformer-based language models
Rao, Ritika
Sharma, Sourabh
Malik, Nitin
INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024, 15 (06) : 2599 - 2605
[32] EXPRESSIVITY TRANSFER IN TRANSFORMER-BASED TEXT-TO-SPEECH SYNTHESIS
Hamed, Mohamed
Lachiri, Zied
2024 IEEE 7TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES, SIGNAL AND IMAGE PROCESSING, ATSIP 2024, 2024, : 443 - 448
[33] RobuTrans: A Robust Transformer-Based Text-to-Speech Model
Li, Naihan
Liu, Yanqing
Wu, Yu
Liu, Shujie
Zhao, Sheng
Liu, Ming
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8228 - 8235
[34] Development of a Text Classification Framework using Transformer-based Embeddings
Yeasmin, Sumona
Afrin, Nazia
Saif, Kashfia
Huq, Mohammad Rezwanul
PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON DATA SCIENCE, TECHNOLOGY AND APPLICATIONS (DATA), 2022, : 74 - 82
[35] Mention Flags (MF): Constraining Transformer-based Text Generators
Wang, Yufei
Wood, Ian D.
Wan, Stephen
Dras, Mark
Johnson, Mark
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 103 - 113
[36] Transformer-Based Automatic Speech Recognition with Auxiliary Input of Source Language Text Toward Transcribing Simultaneous Interpretation
Taniguchi, Shuta
Kato, Tsuneo
Tamura, Akihiro
Yasuda, Keiji
INTERSPEECH 2022, 2022, : 2813 - 2817
[37] Lightweight Scene Text Recognition Based on Transformer
Luan, Xin
Zhang, Jinwei
Xu, Miaomiao
Silamu, Wushouer
Li, Yanbing
SENSORS, 2023, 23 (09)
[38] Transformer-based Unified Recognition of Two Hands Manipulating Objects
Cho, Hoseong
Kim, Chanwoo
Kim, Jihyeon
Lee, Seongyeong
Ismayilzada, Elkhan
Baek, Seungryul
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 4769 - 4778
[39] Transformer-based Human Action Recognition with Dynamic Feature Selection
Lamghari, Soufiane
Bilodeau, Guillaume-Alexandre
Saunier, Nicolas
2023 20TH CONFERENCE ON ROBOTS AND VISION, CRV, 2023, : 129 - 136
[40] UNTIED POSITIONAL ENCODINGS FOR EFFICIENT TRANSFORMER-BASED SPEECH RECOGNITION
Samarakoon, Lahiru
Fung, Ivan
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 108 - 114

← 1 2 3 4 5 →