Reasoning Visual Dialogs with Structural and Partial Observations

被引：74

作者：

Zheng, Zilong ^{[1
]}

Wang, Wenguan ^{[1
,2
]}

Qi, Siyuan ^{[1
,3
]}

Zhu, Song-Chun ^{[1
,3
]}

机构：

[1] Univ Calif Los Angeles, Los Angeles, CA 90095 USA

[2] Incept Inst Artificial Intelligence, Abu Dhabi, U Arab Emirates

[3] Int Ctr AI & Robot Auton CARA, Los Angeles, CA USA

来源：

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年

关键词：

D O I：

10.1109/CVPR.2019.00683

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a novel model to address the task of Visual Dialog which exhibits complex dialog structures. To obtain a reasonable answer based on the current question and the dialog history, the underlying semantic dependencies between dialog entities are essential. In this paper we explicitly formalize this task as inference in a graphical model with partially observed nodes and unknown graph structures (relations in dialog). The given dialog entities are viewed as the observed nodes. The answer to a given question is represented by a node with missing value. We first introduce an Expectation Maximization algorithm to infer both the underlying dialog structures and the missing node values (desired answers). Based on this, we proceed to propose a differentiable graph neural network (GNN) solution that approximates this process. Experiment results on the VisDial and VisDial-Q datasets show that our model outperforms comparative methods. It is also observed that our method can infer the underlying dialog structure for better dialog reasoning.

引用

页码：3662 / 6671

页数：3010

共 50 条

[1] OBSERVATIONS ON READING THE PLATONIC DIALOGS
DALFEN, J
ZEITSCHRIFT FUR PHILOSOPHISCHE FORSCHUNG, 1975, 29 (02): : 169 - 194
[2] A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts
Ge, Yunhao
Xiao, Yao
Xu, Zhi
Zheng, Meng
Karanam, Srikrishna
Chen, Terrence
Itti, Laurent
Wu, Ziyan
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 2195 - 2204
[3] Coreference resolution helps visual dialogs to focus
Yue, Tianwei
Wang, Wenping
Liang, Chen
Chen, Dachi
Hetang, Congrui
Wang, Xuewei
HIGH-CONFIDENCE COMPUTING, 2024, 4 (02):
[4] Partial and dynamic ontology mapping model in dialogs of agents
Freddo, Ademir Roberto
Brito, Robison Cris
Gimenez-Lugo, Gustavo
Tacla, Cesar Augusto
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2007, 4874 : 347 - 356
[5] Reasoning with partial knowledge
Pólos, L
Hannan, MT
SOCIOLOGICAL METHODOLOGY 2002, VOL 32, 2002, 32 : 133 - 181
[6] Multimedia instructional software for visual reasoning: Visual reasoning tutor (VRT)
Hubbard, C
Mengshoel, OJ
Moon, C
Kim, YS
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, 1996, : 261 - 268
[7] Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning"
Amizadeh, Saeed
Palangi, Hamid
Polozov, Oleksandr
Huang, Yichen
Koishida, Kazuhito
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[8] A multimodal analysis of vocal and visual backchannels in spontaneous dialogs
Truong, Khiet P.
Poppe, Ronald
de Kok, Iwan
Heylen, Dirk
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2984 - 2987
[9] Structural Reasoning
Guershon Harel
Osvaldo Soto
International Journal of Research in Undergraduate Mathematics Education, 2017, 3 (1) : 225 - 242
[10] Visual Abductive Reasoning
Liang, Chen
Wang, Wenguan
Zhou, Tianfei
Yang, Yi
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15544 - 15554

← 1 2 3 4 5 →