Latent Compositional Representations Improve Systematic Generalization in Grounded Question Answering

被引：11

作者：

Bogin, Ben ^{[1
]}

Subramanian, Sanjay ^{[2
]}

Gardner, Matt ^{[2
]}

Berant, Jonathan ^{[1
,2
]}

机构：

[1] Tel Aviv Univ, Tel Aviv, Israel

[2] Allen Inst AI, Seattle, WA USA

来源：

TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS | 2021年 / 9卷

基金：

欧洲研究理事会;

关键词：

46;

D O I：

10.1162/tacl_a_00361

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Answering questions that involve multi-step reasoning requires decomposing them and using the answers of intermediate steps to reach the final answer. However, state-of-the-art models in grounded question answering often do not explicitly perform decomposition, leading to difficulties in generalization to out-of-distribution examples. In this work, we propose a model that computes a representation and denotation for all question spans in a bottom-up, compositional manner using a CKY-style parser. Our model induces latent trees, driven by end-to-end (the answer) supervision only. We show that this inductive bias towards tree structures dramatically improves systematic generalization to out-of-distribution examples, compared to strong baselines on an arithmetic expressions benchmark as well as on CLOSURE, a dataset that focuses on systematic generalization for grounded question answering. On this challenging dataset, our model reaches an accuracy of 96.1%, significantly higher than prior models that almost perfectly solve the task on a random, in-distribution split.

引用

页码：195 / 210

页数：16

共 50 条

[1] Grounded Graph Decoding Improves Compositional Generalization in Question Answering
Gai, Yu
Jain, Paras
Zhang, Wendi
Gonzalez, Joseph
Song, Dawn
Stoica, Ion
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1829 - 1838
[2] Multimodal Graph Networks for Compositional Generalization in Visual Question Answering
Saqur, Raeid
Narasimhan, Karthik
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[3] Compositional Networks Enable Systematic Generalization for Grounded Language Understanding
Kuo, Yen-Ling
Katz, Boris
Barbu, Andrei
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 216 - 226
[4] Transformer Module Networks for Systematic Generalization in Visual Question Answering
Yamada, Moyuru
D'Amario, Vanessa
Takemoto, Kentaro
Boix, Xavier
Sasaki, Tomotake
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 10096 - 10105
[5] Compositional Generalization with Grounded Language Models
Wold, Sondre
Simon, Etienne
Georges, Lucas
Charpentier, Gabriel
Kostylev, Egor V.
Velldal, Erik
Ovrelid, Lilja
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 3447 - 3460
[6] BERT Representations for Video Question Answering
Yang, Zekun
Garcia, Noa
Chu, Chenhui
Otani, Mayu
Nakashima, Yuta
Takemura, Haruo
2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1545 - 1554
[7] TVQA: Localized, Compositional Video Question Answering
Lei, Jie
Yu, Licheng
Bansal, Mohit
Berg, Tamara L.
2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 1369 - 1379
[8] Compositional question answering: A divide and conquer approach
Oh, Hyo-Jung
Sung, Ki-Youn
Tang, Myung-Gil
Myaeng, Sung Hyon
INFORMATION PROCESSING & MANAGEMENT, 2011, 47 (06) : 808 - 824
[9] Neural Compositional Denotational Semantics for Question Answering
Gupta, Nitish
Lewis, Mike
2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 2152 - 2161
[10] Measuring Compositional Consistency for Video Question Answering
Gandhi, Mona
Gul, Mustafa Omer
Prakash, Eva
Grunde-McLaughlin, Madeleine
Krishna, Ranjay
Agrawala, Maneesh
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5036 - 5045

← 1 2 3 4 5 →