GraphRevisedIE: Multimodal information extraction with graph-revised network

被引:6
|
作者
Cao, Panfeng [1 ]
Wu, Jian [2 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] Univ Sci & Technol China, Hefei 230026, Anhui, Peoples R China
关键词
Document information extraction; Graph convolutional network; Transformer; IMAGES;
D O I
10.1016/j.patcog.2023.109542
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Key information extraction (KIE) from visually rich documents (VRD) has been a challenging task in document intelligence because of not only the complicated and diverse layouts of VRD that make the model hard to generalize but also the lack of methods to exploit the multimodal features in VRD. In this paper, we propose a light-weight model named GraphRevisedIE that effectively embeds multimodal features such as textual, visual, and layout features from VRD and leverages graph revision and graph convolution to enrich the multimodal embedding with global context. Extensive experiments on multiple real-world datasets show that GraphRevisedIE generalizes to documents of varied layouts and achieves comparable or better performance compared to previous KIE methods. We also publish a business license dataset that contains both real-life and synthesized documents to facilitate research of document KIE. (c) 2023 Elsevier Ltd. All rights reserved.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] UMIE: Unified Multimodal Information Extraction with Instruction Tuning
    Sun, Lin
    Zhang, Kai
    Li, Qingyuan
    Lou, Renze
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19062 - 19070
  • [42] Visualization of information network in human multimodal communication
    School of Knowledge Science, Japan Advanced Institute of Science and Technology, Nomi-shi, 923-1292, Japan
    Hidaka, S., 1600, Institute of Electronics Information Communication Engineers (96):
  • [43] Learnable Graph Convolutional Network With Semisupervised Graph Information Bottleneck
    Zhong, Luying
    Chen, Zhaoliang
    Wu, Zhihao
    Du, Shide
    Chen, Zheyi
    Wang, Shiping
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (01) : 433 - 446
  • [44] Learnable Graph Convolutional Network With Semisupervised Graph Information Bottleneck
    Zhong, Luying
    Chen, Zhaoliang
    Wu, Zhihao
    Du, Shide
    Chen, Zheyi
    Wang, Shiping
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (01) : 433 - 446
  • [45] BUILDING FOOTPRINT EXTRACTION WITH GRAPH CONVOLUTIONAL NETWORK
    Shi, Yilei
    Li, Qinyu
    Zhu, Xiaoxiang
    2019 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2019), 2019, : 5136 - 5139
  • [46] MULTIMODAL GRAPH COARSENING FOR INTERPRETABLE, MRI-BASED BRAIN GRAPH NEURAL NETWORK
    Sebenius, Isaac
    Campbell, Alexander
    Morgan, Sarah E.
    Bullmore, Edward T.
    Lio, Pietro
    2021 IEEE 31ST INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2021,
  • [47] A GRAPH-BASED APPROACH FOR FEATURE EXTRACTION AND SEGMENTATION OF MULTIMODAL IMAGES
    Iyer, Geoffrey
    Chanussot, Jocelyn
    Bertozzi, Andrea L.
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 3320 - 3324
  • [48] Multimodal Graph-based Transformer Framework for Biomedical Relation Extraction
    Pingali, Sriram
    Yadav, Shweta
    Dutta, Pratik
    Saha, Sriparna
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 3741 - 3747
  • [49] Preference-corrected multimodal graph convolutional recommendation network
    Xiangen Jia
    Yihong Dong
    Feng Zhu
    Yu Xin
    Jiangbo Qian
    Applied Intelligence, 2023, 53 : 3947 - 3962
  • [50] MGMP: Multimodal Graph Message Propagation Network for Event Detection
    Li, Jiankai
    Wang, Yunhong
    Li, Weixin
    MULTIMEDIA MODELING (MMM 2022), PT I, 2022, 13141 : 141 - 153