Towards Complex Document Understanding By Discrete Reasoning

被引:7
|
作者
Zhu, Fengbin [1 ,2 ]
Lei, Wenqiang [3 ]
Feng, Fuli [4 ]
Wang, Chao [2 ]
Zhang, Haozhou [3 ]
Chua, Tat-Seng [1 ]
机构
[1] Natl Univ Singapore, Singapore, Singapore
[2] 6Estates Pte Ltd, Singapore, Singapore
[3] Sichuan Univ, Chengdu, Peoples R China
[4] Univ Sci & Technol China, Hefei, Peoples R China
关键词
Question Answering; Visually-rich Document Understanding; Discrete Reasoning;
D O I
10.1145/3503161.3548422
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Document Visual Question Answering (VQA) aims to answer questions over visually-rich documents. In this work, we introduce a new Document VQA dataset, named TAT-DQA, which consists of 3,067 document pages comprising semi-structured table(s) and unstructured text as well as 16,558 question-answer pairs. The documents are sampled from financial reports and contain lots of numbers, which means discrete reasoning capability is demanded to answer the questions. Based on TAT-DQA, we further develop a novel model named MHST that takes into account the information in multi-modalities to intelligently address different types of questions with corresponding strategies, i.e., extraction or reasoning. The experiments show that MHST model significantly outperforms the baseline methods, demonstrating its effectiveness. However, the performance still lags far behind that of human experts. We expect that our TAT-DQA dataset would facilitate the research on understanding of visually-rich documents, especially for scenarios that require discrete reasoning. Also, we hope the proposed model would inspire researchers to design more advanced Document VQA models in future.
引用
收藏
页码:4857 / 4866
页数:10
相关论文
共 50 条
  • [1] Discrete Reasoning Templates for Natural Language Understanding
    Al-Negheimish, Hadeel
    Madhyastha, Pranava
    Russo, Alessandra
    EACL 2021: THE 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2021, : 80 - 87
  • [2] Towards Understanding and Reasoning about Android Interoperations
    Bae, Sora
    Lee, Sungho
    Ryu, Sukyoung
    2019 IEEE/ACM 41ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2019), 2019, : 223 - 233
  • [3] A discrete arabic script for better automatic document understanding
    Abuhaiba, ISI
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2003, 28 (1B): : 77 - 94
  • [5] Towards Robust Visual Understanding: from Recognition to Reasoning
    Gokhale, Tejas
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 20, 2024, : 22665 - 22665
  • [6] Towards understanding and characterizing expert covariational reasoning in physics
    Zimmerman, Charlotte
    Olsho, Alexis
    Brahmia, Suzanne White
    Loverude, Michael
    Boudreaux, Andrew
    Smith, Trevor
    2019 PHYSICS EDUCATION RESEARCH CONFERENCE, 2019, : 693 - 698
  • [7] Towards Efficient Scene Understanding via Squeeze Reasoning
    Li, Xiangtai
    Li, Xia
    You, Ansheng
    Zhang, Li
    Cheng, Guangliang
    Yang, Kuiyuan
    Tong, Yunhai
    Lin, Zhouchen
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 7050 - 7063
  • [8] Towards Automatic Image Annotation Supporting Document Understanding
    Markowska-Kaczmar, Urszula
    Minda, Pawel
    Ociepa, Krzysztof
    Olszowy, Dariusz
    Pawlikowski, Roman
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, PART I, 2011, 6678 : 420 - 427
  • [9] Ontological reasoning for understanding the behaviour of complex biomolecular networks
    Ayadi, Ali
    Zanni-Merk, Cecilia
    de Beuvron, Francois de Bertrand
    Krichen, Saoussen
    2017 IEEE/ACS 14TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2017, : 1486 - 1493
  • [10] Qualitative Reasoning for Understanding the Behaviour of Complex Biomolecular Networks
    Ayadi, Ali
    Zanni-Merk, Cecilia
    de Beuvron, Francois de Bertrand
    KEOD: PROCEEDINGS OF THE 8TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT - VOL. 2, 2016, : 144 - 149