SGFNet: A semantic graph-based multimodal network for financial invoice information extraction

被引:0
|
作者
Luo, Shun [1 ]
Yu, Juan [1 ]
机构
[1] Fuzhou Univ, Sch Econ & Management, 2 Wulongjiang North Ave, Fuzhou 350108, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep learning; Invoice information extraction; Semantic graph; Multimodal modeling;
D O I
10.1016/j.eswa.2024.125156
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
To meet the demand for a large amount of invoice entry work in the financial industry and improve the low accuracy of traditional manual entry, we construct SGFNet, a financial invoice information extraction network that integrates semantic graph associations and multimodal modeling. First, we construct a graph of strong and weak semantic associations between data within each modality based on the correlation of text content. Subsequently, we model the multimodal data in a unified structure, extract the text modal information of invoices along with corresponding image and layout modal information, and guide the fusion and embedding of multimodal data through semantic associations in the graph to produce a richer feature representation. Furthermore, semantically linked multimodal information is fed into an aggregated multimodal self-attention mechanism to establish effective connection between modalities. Finally, with the combination of supervised contrastive learning and smoothed Kullback-Leibler divergence in terms of loss functions, the accuracy degradation problem incurred by sample imbalance and convergence instability is reduced. In our experiments, we achieved F1 scores of 93.71% for the English financial invoice dataset and 96.27% for the Chinese dataset, indicating that the proposed method successfully extracts feature information from different data modalities, thereby achieving satisfactory information extraction results.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Accelerating scientific discovery with generative knowledge extraction, graph-based representation, and multimodal intelligent graph reasoning
    Buehler, Markus J.
    MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2024, 5 (03):
  • [22] Graph-based multimodal fusion with metric learning for multimodal classification
    Angelou, Michalis
    Solachidis, Vassilis
    Vretos, Nicholas
    Daras, Petros
    PATTERN RECOGNITION, 2019, 95 : 296 - 307
  • [23] Graph-based multimodal clustering for social multimedia
    Georgios Petkos
    Manos Schinas
    Symeon Papadopoulos
    Yiannis Kompatsiaris
    Multimedia Tools and Applications, 2017, 76 : 7897 - 7919
  • [24] Graph-based multimodal clustering for social multimedia
    Petkos, Georgios
    Schinas, Manos
    Papadopoulos, Symeon
    Kompatsiaris, Yiannis
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (06) : 7897 - 7919
  • [25] Spatio-Temporal Graph-based Semantic Compositional Network for Video Captioning
    Li, Shun
    Zhang, Ze-Fan
    Ji, Yi
    Li, Ying
    Liu, Chun-Ping
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [26] A Multifocal Graph-Based Neural Network Scheme for Topic Event Extraction
    Wan, Qizhi
    Wan, Changxuan
    Xiao, Keli
    Hu, Rong
    Liu, Dexi
    Liao, Guoqiong
    Liu, Xiping
    Shuai, Yuxin
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2025, 43 (01)
  • [27] SGNet: Structure-Aware Graph-Based Network for Airway Semantic Segmentation
    Tan, Zimeng
    Feng, Jianjiang
    Zhou, Jie
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT I, 2021, 12901 : 153 - 163
  • [28] Web Page Information Extraction Service Based on Graph Convolutional Neural Network and Multimodal Data Fusion
    Zhang, Mingzhu
    Yang, Zhongguo
    Ali, Sikandar
    Ding, Weilong
    2021 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES, ICWS 2021, 2021, : 681 - 687
  • [29] Semantic Graph-Based Approach for Document Organization
    Velazquez-Garcia, Erika
    Lopez-Arevalo, Ivan
    Sosa-Sosa, Victor
    DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, 2012, 151 : 469 - 476
  • [30] Quantum Semantic Communications for Graph-Based Models
    Nunavath, Nikhitha
    Habibie, Muhammad Idham
    Bassoli, Riccardo
    Fitzek, Frank H. P.
    Strinati, Emilio Calvanese
    2024 IEEE 25TH INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATIONS, SPAWC 2024, 2024, : 871 - 875