GraphRevisedIE: Multimodal information extraction with graph-revised network

被引:6
|
作者
Cao, Panfeng [1 ]
Wu, Jian [2 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] Univ Sci & Technol China, Hefei 230026, Anhui, Peoples R China
关键词
Document information extraction; Graph convolutional network; Transformer; IMAGES;
D O I
10.1016/j.patcog.2023.109542
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Key information extraction (KIE) from visually rich documents (VRD) has been a challenging task in document intelligence because of not only the complicated and diverse layouts of VRD that make the model hard to generalize but also the lack of methods to exploit the multimodal features in VRD. In this paper, we propose a light-weight model named GraphRevisedIE that effectively embeds multimodal features such as textual, visual, and layout features from VRD and leverages graph revision and graph convolution to enrich the multimodal embedding with global context. Extensive experiments on multiple real-world datasets show that GraphRevisedIE generalizes to documents of varied layouts and achieves comparable or better performance compared to previous KIE methods. We also publish a business license dataset that contains both real-life and synthesized documents to facilitate research of document KIE. (c) 2023 Elsevier Ltd. All rights reserved.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Multimodal heterogeneous graph convolutional network for image recommendation
    Wei, Weiyi
    Wang, Jian
    Xu, Mengyu
    Zhang, Futong
    MULTIMEDIA SYSTEMS, 2023, 29 (5) : 2747 - 2760
  • [32] An Iterative Graph Learning Convolution Network for Key Information Extraction Based on the Document Inductive Bias
    Deng, Jiyao
    Zhang, Yi
    Zhang, Xinpeng
    Tang, Zhi
    Gao, Liangcai
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2023, 14189 LNCS : 84 - 97
  • [33] Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety
    Krieger, Felix
    Drews, Paul
    Funk, Burkhardt
    Wobbe, Till
    INNOVATION THROUGH INFORMATION SYSTEMS, VOL II: A COLLECTION OF LATEST RESEARCH ON TECHNOLOGY ISSUES, 2021, 47 : 5 - 20
  • [34] Information Extraction from Visually Rich Documents Using Directed Weighted Graph Neural Network
    Gbada, Hamza
    Kalti, Karim
    Mahjoub, Mohamed Ali
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT VI, 2024, 14809 : 248 - 263
  • [35] Formal Languages in Information Extraction and Graph Databases
    Martens, Wim
    BEYOND THE HORIZON OF COMPUTABILITY, CIE 2020, 2020, 12098 : 306 - 309
  • [36] Dual-VIE: Dual-Level Graph Attention Network for Visual Information Extraction
    Zhang, Junwei
    Wang, Hao
    Luo, Xiangfeng
    PRICAI 2022: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I, 2022, 13629 : 422 - 434
  • [37] Multimodal Information Fusion Approach for Noncontact Heart Rate Estimation Using Facial Videos and Graph Convolutional Network
    Yue, Zijie
    Ding, Shuai
    Yang, Shanlin
    Wang, Linjie
    Li, Yinghui
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
  • [38] ESGNet: A multimodal network model incorporating entity semantic graphs for information extraction from Chinese resumes
    Luo, Shun
    Yu, Juan
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (01)
  • [39] Pretraining graph transformer for molecular representation with fusion of multimodal information
    Chen, Ruizhe
    Li, Chunyan
    Wang, Longyue
    Liu, Mingquan
    Chen, Shugao
    Yang, Jiahao
    Zeng, Xiangxiang
    INFORMATION FUSION, 2025, 115
  • [40] MUSTIE: Multimodal Structural Transformer for Web Information Extraction
    Wang, Qifan
    Wang, Jingang
    Quan, Xiaojun
    Feng, Fuli
    Xu, Zenglin
    Nie, Shaoliang
    Wang, Sinong
    Khabsa, Madian
    Firooz, Hamed
    Liu, Dongfang
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 2405 - 2420