Reasoning Visual Dialogs with Structural and Partial Observations

被引:74
|
作者
Zheng, Zilong [1 ]
Wang, Wenguan [1 ,2 ]
Qi, Siyuan [1 ,3 ]
Zhu, Song-Chun [1 ,3 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA 90095 USA
[2] Incept Inst Artificial Intelligence, Abu Dhabi, U Arab Emirates
[3] Int Ctr AI & Robot Auton CARA, Los Angeles, CA USA
关键词
D O I
10.1109/CVPR.2019.00683
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a novel model to address the task of Visual Dialog which exhibits complex dialog structures. To obtain a reasonable answer based on the current question and the dialog history, the underlying semantic dependencies between dialog entities are essential. In this paper we explicitly formalize this task as inference in a graphical model with partially observed nodes and unknown graph structures (relations in dialog). The given dialog entities are viewed as the observed nodes. The answer to a given question is represented by a node with missing value. We first introduce an Expectation Maximization algorithm to infer both the underlying dialog structures and the missing node values (desired answers). Based on this, we proceed to propose a differentiable graph neural network (GNN) solution that approximates this process. Experiment results on the VisDial and VisDial-Q datasets show that our model outperforms comparative methods. It is also observed that our method can infer the underlying dialog structure for better dialog reasoning.
引用
收藏
页码:3662 / 6671
页数:3010
相关论文
共 50 条
  • [31] Visual Concept Reasoning Networks
    Kim, Taesup
    Kim, Sungwoong
    Bengio, Yoshua
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8172 - 8180
  • [32] Semantic representation for visual reasoning
    Ni, Xubin
    Yin, Lirong
    Chen, Xiaobing
    Liu, Shan
    Yang, Bo
    Zheng, Wenfeng
    2018 INTERNATIONAL JOINT CONFERENCE ON METALLURGICAL AND MATERIALS ENGINEERING (JCMME 2018), 2019, 277
  • [33] Transitive inference by visual reasoning
    Schnall, S
    Gattis, M
    PROCEEDINGS OF THE TWENTIETH ANNUAL CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY, 1998, : 929 - 934
  • [34] A Role for Reasoning in Visual Analytics
    Green, Tera Marie
    Maciejewski, Ross
    PROCEEDINGS OF THE 46TH ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, 2013, : 1495 - 1504
  • [35] Same/different in visual reasoning
    Forbus, Kenneth D.
    Lovett, Andrew
    CURRENT OPINION IN BEHAVIORAL SCIENCES, 2021, 37 : 63 - 68
  • [36] Visual abductive reasoning in archaeology
    Shelley, C
    PHILOSOPHY OF SCIENCE, 1996, 63 (02) : 278 - 301
  • [37] VISUAL SPECIFICATIONS FOR TEMPORAL REASONING
    DILLON, LK
    KUTTY, G
    MELLIARSMITH, PM
    MOSER, LE
    RAMAKRISHNA, YS
    JOURNAL OF VISUAL LANGUAGES AND COMPUTING, 1994, 5 (01): : 61 - 81
  • [38] Visual Programming: Compositional visual reasoning without training
    Gupta, Tanmay
    Kembhavi, Aniruddha
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14953 - 14962
  • [39] PRIOR VISUAL RELATIONSHIP REASONING FOR VISUAL QUESTION ANSWERING
    Yang, Zhuoqian
    Qin, Zengchang
    Yu, Jing
    Wan, Tao
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1411 - 1415
  • [40] Compositional Substitutivity of Visual Reasoning for Visual Question Answering
    Li, Chuanhao
    Li, Zhen
    Jing, Chenchen
    Wu, Yuwei
    Zhai, Mingliang
    Jia, Yunde
    COMPUTER VISION - ECCV 2024, PT XLVIII, 2025, 15106 : 143 - 160