Towards Complex Document Understanding By Discrete Reasoning

被引:7
|
作者
Zhu, Fengbin [1 ,2 ]
Lei, Wenqiang [3 ]
Feng, Fuli [4 ]
Wang, Chao [2 ]
Zhang, Haozhou [3 ]
Chua, Tat-Seng [1 ]
机构
[1] Natl Univ Singapore, Singapore, Singapore
[2] 6Estates Pte Ltd, Singapore, Singapore
[3] Sichuan Univ, Chengdu, Peoples R China
[4] Univ Sci & Technol China, Hefei, Peoples R China
关键词
Question Answering; Visually-rich Document Understanding; Discrete Reasoning;
D O I
10.1145/3503161.3548422
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Document Visual Question Answering (VQA) aims to answer questions over visually-rich documents. In this work, we introduce a new Document VQA dataset, named TAT-DQA, which consists of 3,067 document pages comprising semi-structured table(s) and unstructured text as well as 16,558 question-answer pairs. The documents are sampled from financial reports and contain lots of numbers, which means discrete reasoning capability is demanded to answer the questions. Based on TAT-DQA, we further develop a novel model named MHST that takes into account the information in multi-modalities to intelligently address different types of questions with corresponding strategies, i.e., extraction or reasoning. The experiments show that MHST model significantly outperforms the baseline methods, demonstrating its effectiveness. However, the performance still lags far behind that of human experts. We expect that our TAT-DQA dataset would facilitate the research on understanding of visually-rich documents, especially for scenarios that require discrete reasoning. Also, we hope the proposed model would inspire researchers to design more advanced Document VQA models in future.
引用
收藏
页码:4857 / 4866
页数:10
相关论文
共 50 条
  • [41] Towards Understanding Complex Known-Item Requests on Reddit
    Meier, Florian
    Bogers, Toine
    Gaede, Maria
    Thomsen, Line Ebdrup
    PROCEEDINGS OF THE 32ND ACM CONFERENCE ON HYPERTEXT AND SOCIAL MEDIA (HT '21), 2021, : 143 - 154
  • [42] TRANSFERENTIAL LEADERSHIP - TOWARDS A MORE COMPLEX UNDERSTANDING OF CHARISMA IN ORGANIZATIONS
    PAUCHANT, TC
    ORGANIZATION STUDIES, 1991, 12 (04) : 507 - 527
  • [43] PTSD: Towards a Patient-Level Understanding of a Complex Disorder
    Wynn, Gary H.
    Benedek, David M.
    JOURNAL OF NERVOUS AND MENTAL DISEASE, 2017, 205 (02) : 75 - 76
  • [44] DuoRC: Towards Complex Language Understanding with Paraphrased Reading Comprehension
    Saha, Amrita
    Aralikatte, Rahul
    Khapra, Mitesh M.
    Sankaranarayanan, Karthik
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 1683 - 1693
  • [45] Solidarity - Questions towards a social psychological understanding of a complex term
    Stuetzle-Hebel, Monika
    GRUPPENDYNAMIK UND ORGANISATIONSBERATUNG, 2013, 44 (01): : 17 - 23
  • [46] Towards an Understanding of Control of Complex Rhythmical "Wavelike" Coordination in Humans
    Sanders, Ross Howard
    Levitin, Daniel J.
    BRAIN SCIENCES, 2020, 10 (04)
  • [47] multidimensional thinking: towards a more complex and human understanding of rationality
    Alonso, Cesar Augusto Mora
    Rivera, Bibiana Judith Cruz
    CHILDHOOD AND PHILOSOPHY, 2024, 20 : 1 - 23
  • [48] Understanding complex clinical reasoning in infectious diseases for improving clinical decision support design
    Islam, Roosan
    Weir, Charlene R.
    Jones, Makoto
    Del Fiol, Guilherme
    Samore, Matthew H.
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2015, 15
  • [49] Understanding complex clinical reasoning in infectious diseases for improving clinical decision support design
    Roosan Islam
    Charlene R. Weir
    Makoto Jones
    Guilherme Del Fiol
    Matthew H. Samore
    BMC Medical Informatics and Decision Making, 15
  • [50] TOWARDS A BASIC DOCUMENT
    GREGORIOS, PM
    ECUMENICAL REVIEW, 1989, 41 (02): : 184 - 193