GraphRevisedIE: Multimodal information extraction with graph-revised network

被引:6
|
作者
Cao, Panfeng [1 ]
Wu, Jian [2 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] Univ Sci & Technol China, Hefei 230026, Anhui, Peoples R China
关键词
Document information extraction; Graph convolutional network; Transformer; IMAGES;
D O I
10.1016/j.patcog.2023.109542
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Key information extraction (KIE) from visually rich documents (VRD) has been a challenging task in document intelligence because of not only the complicated and diverse layouts of VRD that make the model hard to generalize but also the lack of methods to exploit the multimodal features in VRD. In this paper, we propose a light-weight model named GraphRevisedIE that effectively embeds multimodal features such as textual, visual, and layout features from VRD and leverages graph revision and graph convolution to enrich the multimodal embedding with global context. Extensive experiments on multiple real-world datasets show that GraphRevisedIE generalizes to documents of varied layouts and achieves comparable or better performance compared to previous KIE methods. We also publish a business license dataset that contains both real-life and synthesized documents to facilitate research of document KIE. (c) 2023 Elsevier Ltd. All rights reserved.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Graph-Revised Convolutional Network
    Yu, Donghan
    Zhang, Ruohong
    Jiang, Zhengbao
    Wu, Yuexin
    Yang, Yiming
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2020, PT III, 2021, 12459 : 378 - 393
  • [2] SGFNet: A semantic graph-based multimodal network for financial invoice information extraction
    Luo, Shun
    Yu, Juan
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 258
  • [3] Web Page Information Extraction Service Based on Graph Convolutional Neural Network and Multimodal Data Fusion
    Zhang, Mingzhu
    Yang, Zhongguo
    Ali, Sikandar
    Ding, Weilong
    2021 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES, ICWS 2021, 2021, : 681 - 687
  • [4] Graph Convolution for Multimodal Information Extraction from Visually Rich Documents
    Liu, Xiaojing
    Gao, Feiyu
    Zhang, Qiong
    Zhao, Huasha
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES(NAACL HLT 2019), VOL. 2 (INDUSTRY PAPERS), 2019, : 32 - 39
  • [5] Multimodal weighted graph representation for information extraction from visually rich documents
    Gbada, Hamza
    Kalti, Karim
    Mahjoub, Mohamed Ali
    NEUROCOMPUTING, 2024, 573
  • [6] FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction
    Lee, Chen-Yu
    Li, Chun-Liang
    Zhang, Hao
    Dozat, Timothy
    Perot, Vincent
    Su, Guolong
    Zhang, Xiang
    Sohn, Kihyuk
    Glushnev, Nikolai
    Wang, Renshen
    Ainslie, Joshua
    Long, Shangbang
    Qin, Siyang
    Fujii, Yasuhisa
    Hua, Nan
    Pfister, Tomas
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 9011 - 9026
  • [7] Multimodal graph inference network for scene graph generation
    Duan, Jingwen
    Min, Weidong
    Lin, Deyu
    Xu, Jianfeng
    Xiong, Xin
    APPLIED INTELLIGENCE, 2021, 51 (12) : 8768 - 8783
  • [8] Multimodal graph inference network for scene graph generation
    Jingwen Duan
    Weidong Min
    Deyu Lin
    Jianfeng Xu
    Xin Xiong
    Applied Intelligence, 2021, 51 : 8768 - 8783
  • [9] From Tweet to Graph: Social Network Analysis for Semantic Information Extraction
    Abascal-Mena, Rocio
    Lema, Rose
    Sedes, Florence
    2014 IEEE EIGHTH INTERNATIONAL CONFERENCE ON RESEARCH CHALLENGES IN INFORMATION SCIENCE (RCIS), 2014,
  • [10] TAGNet: Temporal Aware Graph Convolution Network for Clinical Information Extraction
    Wang, Shuai
    Liu, Junfei
    2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 2105 - 2108