Towards Complex Document Understanding By Discrete Reasoning

被引：7

作者：

Zhu, Fengbin ^{[1
,2
]}

Lei, Wenqiang ^{[3
]}

Feng, Fuli ^{[4
]}

Wang, Chao ^{[2
]}

Zhang, Haozhou ^{[3
]}

Chua, Tat-Seng ^{[1
]}

机构：

[1] Natl Univ Singapore, Singapore, Singapore

[2] 6Estates Pte Ltd, Singapore, Singapore

[3] Sichuan Univ, Chengdu, Peoples R China

[4] Univ Sci & Technol China, Hefei, Peoples R China

来源：

PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年

关键词：

Question Answering; Visually-rich Document Understanding; Discrete Reasoning;

D O I：

10.1145/3503161.3548422

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Document Visual Question Answering (VQA) aims to answer questions over visually-rich documents. In this work, we introduce a new Document VQA dataset, named TAT-DQA, which consists of 3,067 document pages comprising semi-structured table(s) and unstructured text as well as 16,558 question-answer pairs. The documents are sampled from financial reports and contain lots of numbers, which means discrete reasoning capability is demanded to answer the questions. Based on TAT-DQA, we further develop a novel model named MHST that takes into account the information in multi-modalities to intelligently address different types of questions with corresponding strategies, i.e., extraction or reasoning. The experiments show that MHST model significantly outperforms the baseline methods, demonstrating its effectiveness. However, the performance still lags far behind that of human experts. We expect that our TAT-DQA dataset would facilitate the research on understanding of visually-rich documents, especially for scenarios that require discrete reasoning. Also, we hope the proposed model would inspire researchers to design more advanced Document VQA models in future.

引用

页码：4857 / 4866

页数：10

共 50 条

[21] Towards a Molecular Understanding of the Fanconi Anemia Core Complex
Hodson, Charlotte
Walden, Helen
ANEMIA, 2012, 2012
[22] Services for people with complex psychosis: towards a new understanding
Edwards, Tom
Macpherson, Rob
Commander, Martin
Meaden, Alan
Kalidindi, Sridevi
BJPSYCH BULLETIN, 2016, 40 (03): : 156 - 161
[23] TOWARDS AN UNDERSTANDING OF THE SARCOSTEMMA-VIMINALE (ASCLEPIADACEAE) COMPLEX
LIEDE, S
MEVE, U
BOTANICAL JOURNAL OF THE LINNEAN SOCIETY, 1993, 112 (01) : 1 - 15
[24] Towards the Understanding of Complex Traits in Rice: Substantially or Superficially?
Yamamoto, Toshio
Yonemaru, Junichi
Yano, Masahiro
DNA RESEARCH, 2009, 16 (03) : 141 - 154
[25] Towards understanding of the complex structure of growing yeast populations
Cipollina, Chiara
Vai, Marina
Porro, Danilo
Hatzis, Christos
JOURNAL OF BIOTECHNOLOGY, 2007, 128 (02) : 393 - 402
[26] Towards a renewed understanding of the complex nerves of the digital divide
Mubarak, Farooq
JOURNAL OF SOCIAL INCLUSION, 2015, 6 (01): : 71 - +
[27] Interactions between reasoning about complex systems and conceptual understanding in learning chemistry
Samon, Sigal
Levy, Sharona T.
JOURNAL OF RESEARCH IN SCIENCE TEACHING, 2020, 57 (01) : 58 - 86
[28] A Simple Framework for Scene Graph Reasoning with Semantic Understanding of Complex Sentence Structure
Heo, Yoonseok
Kang, Sangwoo
MATHEMATICS, 2023, 11 (17)
[29] Understanding Enkratic Reasoning
Gjelsvik, Olav
ORGANON F, 2013, 20 (04) : 464 - 483
[30] Shape reasoning and understanding
Falcidieno, B
Veltkamp, R
COMPUTERS & GRAPHICS-UK, 2006, 30 (02): : 158 - 159

← 1 2 3 4 5 →