Reasoning Visual Dialogs with Structural and Partial Observations

被引:74
|
作者
Zheng, Zilong [1 ]
Wang, Wenguan [1 ,2 ]
Qi, Siyuan [1 ,3 ]
Zhu, Song-Chun [1 ,3 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA 90095 USA
[2] Incept Inst Artificial Intelligence, Abu Dhabi, U Arab Emirates
[3] Int Ctr AI & Robot Auton CARA, Los Angeles, CA USA
关键词
D O I
10.1109/CVPR.2019.00683
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a novel model to address the task of Visual Dialog which exhibits complex dialog structures. To obtain a reasonable answer based on the current question and the dialog history, the underlying semantic dependencies between dialog entities are essential. In this paper we explicitly formalize this task as inference in a graphical model with partially observed nodes and unknown graph structures (relations in dialog). The given dialog entities are viewed as the observed nodes. The answer to a given question is represented by a node with missing value. We first introduce an Expectation Maximization algorithm to infer both the underlying dialog structures and the missing node values (desired answers). Based on this, we proceed to propose a differentiable graph neural network (GNN) solution that approximates this process. Experiment results on the VisDial and VisDial-Q datasets show that our model outperforms comparative methods. It is also observed that our method can infer the underlying dialog structure for better dialog reasoning.
引用
收藏
页码:3662 / 6671
页数:3010
相关论文
共 50 条
  • [1] OBSERVATIONS ON READING THE PLATONIC DIALOGS
    DALFEN, J
    ZEITSCHRIFT FUR PHILOSOPHISCHE FORSCHUNG, 1975, 29 (02): : 169 - 194
  • [2] A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts
    Ge, Yunhao
    Xiao, Yao
    Xu, Zhi
    Zheng, Meng
    Karanam, Srikrishna
    Chen, Terrence
    Itti, Laurent
    Wu, Ziyan
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 2195 - 2204
  • [3] Coreference resolution helps visual dialogs to focus
    Yue, Tianwei
    Wang, Wenping
    Liang, Chen
    Chen, Dachi
    Hetang, Congrui
    Wang, Xuewei
    HIGH-CONFIDENCE COMPUTING, 2024, 4 (02):
  • [4] Partial and dynamic ontology mapping model in dialogs of agents
    Freddo, Ademir Roberto
    Brito, Robison Cris
    Gimenez-Lugo, Gustavo
    Tacla, Cesar Augusto
    PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2007, 4874 : 347 - 356
  • [5] Reasoning with partial knowledge
    Pólos, L
    Hannan, MT
    SOCIOLOGICAL METHODOLOGY 2002, VOL 32, 2002, 32 : 133 - 181
  • [6] Multimedia instructional software for visual reasoning: Visual reasoning tutor (VRT)
    Hubbard, C
    Mengshoel, OJ
    Moon, C
    Kim, YS
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, 1996, : 261 - 268
  • [7] Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning"
    Amizadeh, Saeed
    Palangi, Hamid
    Polozov, Oleksandr
    Huang, Yichen
    Koishida, Kazuhito
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [8] A multimodal analysis of vocal and visual backchannels in spontaneous dialogs
    Truong, Khiet P.
    Poppe, Ronald
    de Kok, Iwan
    Heylen, Dirk
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2984 - 2987
  • [9] Structural Reasoning
    Guershon Harel
    Osvaldo Soto
    International Journal of Research in Undergraduate Mathematics Education, 2017, 3 (1) : 225 - 242
  • [10] Visual Abductive Reasoning
    Liang, Chen
    Wang, Wenguan
    Zhou, Tianfei
    Yang, Yi
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15544 - 15554