QAlayout: Question Answering Layout Based on Multimodal Attention for Visual Question Answering on Corporate Document

被引:2
|
作者
Mahamoud, Ibrahim Souleiman [1 ,2 ]
Coustaty, Mickael [1 ]
Joseph, Aurelie [2 ]
d'Andecy, Vincent Poulain [2 ]
Ogier, Jean-Marc [1 ]
机构
[1] La Rochelle Univ, L3i Ave Michel Crepeau, F-17042 La Rochelle, France
[2] Yooz, 1 Rue Fleming, F-17000 La Rochelle, France
来源
关键词
Visual question answering; Multimodality; Attention mechanism;
D O I
10.1007/978-3-031-06555-2_44
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The extraction of information from corporate documents is increasing in the research field both for its economic aspect and a scientific challenge. To extract this information the use of textual and visual content becomes unavoidable to understand the inherent information of the image. The information to be extracted is most often fixed beforehand (i.e. classification of words by date, total amount, etc.). The information to be extracted is evolving, so we would not like to be restricted to predefine word classes. We would like to question a document such as "which is the address of invoicing?" as we can have several addresses in an invoice. We formulate our request as a question and our model will try to answer. Our model got the result 77.65% on the Docvqa dataset while drastically reducing the number of model parameters to allow us to use it in an industrial context and we use an attention model using several modalities that help us in the interpertation of the results obtained. Our other contribution in this paper is a new dataset for Visual Question answering on corporate document of invoices from RVL-CDIP [8]. The public data on corporate documents are less present in the state-of-the-art, this contribution allow us to test our models to the invoice data with the VQA methods.
引用
收藏
页码:659 / 673
页数:15
相关论文
共 50 条
  • [21] Re-Attention for Visual Question Answering
    Guo, Wenya
    Zhang, Ying
    Yang, Jufeng
    Yuan, Xiaojie
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 6730 - 6743
  • [22] Re-Attention for Visual Question Answering
    Guo, Wenya
    Zhang, Ying
    Wu, Xiaoping
    Yang, Jufeng
    Cai, Xiangrui
    Yuan, Xiaojie
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 91 - 98
  • [23] Feature Enhancement in Attention for Visual Question Answering
    Lin, Yuetan
    Pang, Zhangyang
    Wang, Donghui
    Zhuang, Yueting
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 4216 - 4222
  • [24] Feature Fusion Attention Visual Question Answering
    Wang, Chunlin
    Sun, Jianyong
    Chen, Xiaolin
    ICMLC 2019: 2019 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, 2019, : 412 - 416
  • [25] Dynamic Capsule Attention for Visual Question Answering
    Zhou, Yiyi
    Ji, Rongrong
    Su, Jinsong
    Sun, Xiaoshuai
    Chen, Weiqiu
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9324 - 9331
  • [26] Visual Question Answering based on multimodal triplet knowledge accumuation
    Wang, Fengjuan
    An, Gaoyun
    2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 81 - 84
  • [27] Visual Question Answering using Explicit Visual Attention
    Lioutas, Vasileios
    Passalis, Nikolaos
    Tefas, Anastasios
    2018 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2018,
  • [28] Visual Question Answering
    Nada, Ahmed
    Chen, Min
    2024 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS, ICNC, 2024, : 6 - 10
  • [29] Stacked Attention based Textbook Visual Question Answering with BERT
    Aishwarya, R.
    Sarath, P.
    Rahman, Shibil P.
    Sneha, U.
    Manmadhan, Sruthy
    2022 IEEE 19TH INDIA COUNCIL INTERNATIONAL CONFERENCE, INDICON, 2022,
  • [30] Multi-stage Attention based Visual Question Answering
    Mishra, Aakansha
    Anand, Ashish
    Guha, Prithwijit
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9407 - 9414