Reasoning Visual Dialogs with Structural and Partial Observations

被引:74
|
作者
Zheng, Zilong [1 ]
Wang, Wenguan [1 ,2 ]
Qi, Siyuan [1 ,3 ]
Zhu, Song-Chun [1 ,3 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA 90095 USA
[2] Incept Inst Artificial Intelligence, Abu Dhabi, U Arab Emirates
[3] Int Ctr AI & Robot Auton CARA, Los Angeles, CA USA
关键词
D O I
10.1109/CVPR.2019.00683
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a novel model to address the task of Visual Dialog which exhibits complex dialog structures. To obtain a reasonable answer based on the current question and the dialog history, the underlying semantic dependencies between dialog entities are essential. In this paper we explicitly formalize this task as inference in a graphical model with partially observed nodes and unknown graph structures (relations in dialog). The given dialog entities are viewed as the observed nodes. The answer to a given question is represented by a node with missing value. We first introduce an Expectation Maximization algorithm to infer both the underlying dialog structures and the missing node values (desired answers). Based on this, we proceed to propose a differentiable graph neural network (GNN) solution that approximates this process. Experiment results on the VisDial and VisDial-Q datasets show that our model outperforms comparative methods. It is also observed that our method can infer the underlying dialog structure for better dialog reasoning.
引用
收藏
页码:3662 / 6671
页数:3010
相关论文
共 50 条
  • [42] Equational reasoning via partial reflection
    Geuvers, H
    Wiedijk, F
    Zwanenburg, J
    THEOREM PROVING IN HIGHER ORDER LOGICS, PROCEEDINGS, 2000, 1869 : 162 - 178
  • [43] Defeasible reasoning and partial order planning
    Garcia, Diego R.
    Garcia, Alejandro J.
    Simari, Guillermo R.
    FOUNDATIONS OF INFORMATION AND KNOWLEDGE SYSTEMS, PROCEEDINGS, 2008, 4932 : 311 - 328
  • [44] TERMINOLOGICAL REASONING AND PARTIAL INDUCTIVE DEFINITIONS
    HANSCHKE, P
    LECTURE NOTES IN ARTIFICIAL INTELLIGENCE, 1992, 596 : 221 - 237
  • [45] DIAGNOSTIC REASONING ON THE BASIS OF PARTIAL INFORMATION
    PATEL, VL
    GROEN, GJ
    AROCHA, JF
    BULLETIN OF THE PSYCHONOMIC SOCIETY, 1992, 30 (06) : 461 - 461
  • [46] Research on the Security of Visual Reasoning CAPTCHA
    Gao, Yipeng
    Gao, Haichang
    Luo, Sainan
    Zi, Yang
    Zhang, Shudong
    Mao, Wenjie
    Wang, Ping
    Shen, Yulong
    Yan, Jeff
    PROCEEDINGS OF THE 30TH USENIX SECURITY SYMPOSIUM, 2021, : 3291 - 3308
  • [47] A TEACHING EXPERIMENT TO ORIENT VISUAL REASONING
    Saglam, Yasemin
    Bulbul, Ali
    PROCEEDINGS OF THE 35TH CONFERENCE OF THE INTERNATIONAL GROUP FOR THE PSYCHOLOGY OF MATHEMATICS EDUCATION, VOL. 1: DEVELOPING MATHEMATICAL THINKING, 2011, : 381 - 381
  • [48] A visual language for Web querying and reasoning
    Berger, S
    Bry, F
    Schaffert, S
    PRINCIPLES AND PRACTICE OF SEMANTIC WEB REASONING, 2003, 2901 : 99 - 112
  • [49] Object Level Visual Reasoning in Videos
    Baradel, Fabien
    Neverova, Natalia
    Wolf, Christian
    Mille, Julien
    Mori, Greg
    COMPUTER VISION - ECCV 2018, PT XIII, 2018, 11217 : 106 - 122
  • [50] Visual reasoning instructional software system
    Hubbard, C
    Mengshoel, OJ
    Moon, C
    Kim, YS
    COMPUTERS & EDUCATION, 1997, 28 (04) : 237 - 250