Towards Complex Document Understanding By Discrete Reasoning

被引:7
|
作者
Zhu, Fengbin [1 ,2 ]
Lei, Wenqiang [3 ]
Feng, Fuli [4 ]
Wang, Chao [2 ]
Zhang, Haozhou [3 ]
Chua, Tat-Seng [1 ]
机构
[1] Natl Univ Singapore, Singapore, Singapore
[2] 6Estates Pte Ltd, Singapore, Singapore
[3] Sichuan Univ, Chengdu, Peoples R China
[4] Univ Sci & Technol China, Hefei, Peoples R China
关键词
Question Answering; Visually-rich Document Understanding; Discrete Reasoning;
D O I
10.1145/3503161.3548422
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Document Visual Question Answering (VQA) aims to answer questions over visually-rich documents. In this work, we introduce a new Document VQA dataset, named TAT-DQA, which consists of 3,067 document pages comprising semi-structured table(s) and unstructured text as well as 16,558 question-answer pairs. The documents are sampled from financial reports and contain lots of numbers, which means discrete reasoning capability is demanded to answer the questions. Based on TAT-DQA, we further develop a novel model named MHST that takes into account the information in multi-modalities to intelligently address different types of questions with corresponding strategies, i.e., extraction or reasoning. The experiments show that MHST model significantly outperforms the baseline methods, demonstrating its effectiveness. However, the performance still lags far behind that of human experts. We expect that our TAT-DQA dataset would facilitate the research on understanding of visually-rich documents, especially for scenarios that require discrete reasoning. Also, we hope the proposed model would inspire researchers to design more advanced Document VQA models in future.
引用
收藏
页码:4857 / 4866
页数:10
相关论文
共 50 条
  • [21] Towards a Molecular Understanding of the Fanconi Anemia Core Complex
    Hodson, Charlotte
    Walden, Helen
    ANEMIA, 2012, 2012
  • [22] Services for people with complex psychosis: towards a new understanding
    Edwards, Tom
    Macpherson, Rob
    Commander, Martin
    Meaden, Alan
    Kalidindi, Sridevi
    BJPSYCH BULLETIN, 2016, 40 (03): : 156 - 161
  • [23] TOWARDS AN UNDERSTANDING OF THE SARCOSTEMMA-VIMINALE (ASCLEPIADACEAE) COMPLEX
    LIEDE, S
    MEVE, U
    BOTANICAL JOURNAL OF THE LINNEAN SOCIETY, 1993, 112 (01) : 1 - 15
  • [24] Towards the Understanding of Complex Traits in Rice: Substantially or Superficially?
    Yamamoto, Toshio
    Yonemaru, Junichi
    Yano, Masahiro
    DNA RESEARCH, 2009, 16 (03) : 141 - 154
  • [25] Towards understanding of the complex structure of growing yeast populations
    Cipollina, Chiara
    Vai, Marina
    Porro, Danilo
    Hatzis, Christos
    JOURNAL OF BIOTECHNOLOGY, 2007, 128 (02) : 393 - 402
  • [26] Towards a renewed understanding of the complex nerves of the digital divide
    Mubarak, Farooq
    JOURNAL OF SOCIAL INCLUSION, 2015, 6 (01): : 71 - +
  • [27] Interactions between reasoning about complex systems and conceptual understanding in learning chemistry
    Samon, Sigal
    Levy, Sharona T.
    JOURNAL OF RESEARCH IN SCIENCE TEACHING, 2020, 57 (01) : 58 - 86
  • [28] A Simple Framework for Scene Graph Reasoning with Semantic Understanding of Complex Sentence Structure
    Heo, Yoonseok
    Kang, Sangwoo
    MATHEMATICS, 2023, 11 (17)
  • [29] Understanding Enkratic Reasoning
    Gjelsvik, Olav
    ORGANON F, 2013, 20 (04) : 464 - 483
  • [30] Shape reasoning and understanding
    Falcidieno, B
    Veltkamp, R
    COMPUTERS & GRAPHICS-UK, 2006, 30 (02): : 158 - 159