Reasoning Visual Dialogs with Structural and Partial Observations

被引：74

作者：

Zheng, Zilong ^{[1
]}

Wang, Wenguan ^{[1
,2
]}

Qi, Siyuan ^{[1
,3
]}

Zhu, Song-Chun ^{[1
,3
]}

机构：

[1] Univ Calif Los Angeles, Los Angeles, CA 90095 USA

[2] Incept Inst Artificial Intelligence, Abu Dhabi, U Arab Emirates

[3] Int Ctr AI & Robot Auton CARA, Los Angeles, CA USA

来源：

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年

关键词：

D O I：

10.1109/CVPR.2019.00683

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a novel model to address the task of Visual Dialog which exhibits complex dialog structures. To obtain a reasonable answer based on the current question and the dialog history, the underlying semantic dependencies between dialog entities are essential. In this paper we explicitly formalize this task as inference in a graphical model with partially observed nodes and unknown graph structures (relations in dialog). The given dialog entities are viewed as the observed nodes. The answer to a given question is represented by a node with missing value. We first introduce an Expectation Maximization algorithm to infer both the underlying dialog structures and the missing node values (desired answers). Based on this, we proceed to propose a differentiable graph neural network (GNN) solution that approximates this process. Experiment results on the VisDial and VisDial-Q datasets show that our model outperforms comparative methods. It is also observed that our method can infer the underlying dialog structure for better dialog reasoning.

引用

页码：3662 / 6671

页数：3010

共 50 条

[31] Visual Concept Reasoning Networks
Kim, Taesup
Kim, Sungwoong
Bengio, Yoshua
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8172 - 8180
[32] Semantic representation for visual reasoning
Ni, Xubin
Yin, Lirong
Chen, Xiaobing
Liu, Shan
Yang, Bo
Zheng, Wenfeng
2018 INTERNATIONAL JOINT CONFERENCE ON METALLURGICAL AND MATERIALS ENGINEERING (JCMME 2018), 2019, 277
[33] Transitive inference by visual reasoning
Schnall, S
Gattis, M
PROCEEDINGS OF THE TWENTIETH ANNUAL CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY, 1998, : 929 - 934
[34] A Role for Reasoning in Visual Analytics
Green, Tera Marie
Maciejewski, Ross
PROCEEDINGS OF THE 46TH ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, 2013, : 1495 - 1504
[35] Same/different in visual reasoning
Forbus, Kenneth D.
Lovett, Andrew
CURRENT OPINION IN BEHAVIORAL SCIENCES, 2021, 37 : 63 - 68
[36] Visual abductive reasoning in archaeology
Shelley, C
PHILOSOPHY OF SCIENCE, 1996, 63 (02) : 278 - 301
[37] VISUAL SPECIFICATIONS FOR TEMPORAL REASONING
DILLON, LK
KUTTY, G
MELLIARSMITH, PM
MOSER, LE
RAMAKRISHNA, YS
JOURNAL OF VISUAL LANGUAGES AND COMPUTING, 1994, 5 (01): : 61 - 81
[38] Visual Programming: Compositional visual reasoning without training
Gupta, Tanmay
Kembhavi, Aniruddha
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14953 - 14962
[39] PRIOR VISUAL RELATIONSHIP REASONING FOR VISUAL QUESTION ANSWERING
Yang, Zhuoqian
Qin, Zengchang
Yu, Jing
Wan, Tao
2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1411 - 1415
[40] Compositional Substitutivity of Visual Reasoning for Visual Question Answering
Li, Chuanhao
Li, Zhen
Jing, Chenchen
Wu, Yuwei
Zhai, Mingliang
Jia, Yunde
COMPUTER VISION - ECCV 2024, PT XLVIII, 2025, 15106 : 143 - 160

← 1 2 3 4 5 →