QAlayout: Question Answering Layout Based on Multimodal Attention for Visual Question Answering on Corporate Document

被引:2
|
作者
Mahamoud, Ibrahim Souleiman [1 ,2 ]
Coustaty, Mickael [1 ]
Joseph, Aurelie [2 ]
d'Andecy, Vincent Poulain [2 ]
Ogier, Jean-Marc [1 ]
机构
[1] La Rochelle Univ, L3i Ave Michel Crepeau, F-17042 La Rochelle, France
[2] Yooz, 1 Rue Fleming, F-17000 La Rochelle, France
来源
关键词
Visual question answering; Multimodality; Attention mechanism;
D O I
10.1007/978-3-031-06555-2_44
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The extraction of information from corporate documents is increasing in the research field both for its economic aspect and a scientific challenge. To extract this information the use of textual and visual content becomes unavoidable to understand the inherent information of the image. The information to be extracted is most often fixed beforehand (i.e. classification of words by date, total amount, etc.). The information to be extracted is evolving, so we would not like to be restricted to predefine word classes. We would like to question a document such as "which is the address of invoicing?" as we can have several addresses in an invoice. We formulate our request as a question and our model will try to answer. Our model got the result 77.65% on the Docvqa dataset while drastically reducing the number of model parameters to allow us to use it in an industrial context and we use an attention model using several modalities that help us in the interpertation of the results obtained. Our other contribution in this paper is a new dataset for Visual Question answering on corporate document of invoices from RVL-CDIP [8]. The public data on corporate documents are less present in the state-of-the-art, this contribution allow us to test our models to the invoice data with the VQA methods.
引用
收藏
页码:659 / 673
页数:15
相关论文
共 50 条
  • [41] Multimodal Prompt Retrieval for Generative Visual Question Answering
    Ossowski, Timothy
    Hu, Junjie
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 2518 - 2535
  • [42] Federated Document Visual Question Answering: A Pilot Study
    Nguyen, Khanh
    Karatzas, Dimosthenis
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT VI, 2024, 14809 : 146 - 163
  • [43] Privacy-Aware Document Visual Question Answering
    Tito, Ruben
    Nguyen, Khanh
    Tobaben, Marlon
    Kerkouche, Raouf
    Souibgui, Mohamed Ali
    Jung, Kangsoo
    Jalko, Joonas
    DAndecy, Vincent Poulain
    Joseph, Aurelie
    Kang, Lei
    Valveny, Ernest
    Honkela, Antti
    Fritz, Mario
    Karatzas, Dimosthenis
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT VI, 2024, 14809 : 199 - 218
  • [44] Visual Experience-Based Question Answering with Complex Multimodal Environments
    Kim, Incheol
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020 (2020)
  • [45] Document Retrieval Based on Question Answering System
    Nguyen Tuan Dang
    Do Thi Thanh Tuyen
    ICIC 2009: SECOND INTERNATIONAL CONFERENCE ON INFORMATION AND COMPUTING SCIENCE, VOL 1, PROCEEDINGS: COMPUTING SCIENCE AND ITS APPLICATION, 2009, : 183 - +
  • [46] Automatic Question Answering based on Single Document
    Wang, Xiaodong
    Xu, Bei
    Zhuge, Hai
    PROCEEDINGS OF 2016 12TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG), 2016, : 90 - 96
  • [47] Question Difficulty Estimation Based on Attention Model for Question Answering
    Song, Hyun-Je
    Yoon, Su-Hwan
    Park, Seong-Bae
    APPLIED SCIENCES-BASEL, 2021, 11 (24):
  • [48] ADAPTIVE ATTENTION FUSION NETWORK FOR VISUAL QUESTION ANSWERING
    Gu, Geonmo
    Kim, Seong Tae
    Ro, Yong Man
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 997 - 1002
  • [49] Triple attention network for sentimental visual question answering
    Ruwa, Nelson
    Mao, Qirong
    Song, Heping
    Jia, Hongjie
    Dong, Ming
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2019, 189
  • [50] Collaborative Attention Network to Enhance Visual Question Answering
    Gu, Rui
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 124 : 304 - 305