Visual-Semantic Graph Matching for Visual Grounding

被引:15
|
作者
Jing, Chenchen [1 ]
Wu, Yuwei [1 ]
Pei, Mingtao [1 ]
Hu, Yao [2 ]
Jia, Yunde [1 ]
Wu, Qi [3 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
[2] Alibaba Youku Cognit & Intelligent Lab, Beijing, Peoples R China
[3] Univ Adelaide, Adelaide, SA, Australia
关键词
Visual Grounding; Graph Matching; Visual Scene Graph; Language Scene Graph; LANGUAGE;
D O I
10.1145/3394171.3413902
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual Grounding is the task of associating entities in a natural language sentence with objects in an image. In this paper, we formulate visual grounding as a graph matching problem to find node correspondences between a visual scene graph and a language scene graph. These two graphs are heterogeneous, representing structure layouts of the sentence and image, respectively. We learn unified contextual node representations of the two graphs by using a cross-modal graph convolutional network to reduce their discrepancy. The graph matching is thus relaxed as a linear assignment problem because the learned node representations characterize both node information and structure information. A permutation loss and a semantic cycle-consistency loss are further introduced to solve the linear assignment problem with or without ground-truth correspondences. Experimental results on two visual grounding tasks, i.e., referring expression comprehension and phrase localization, demonstrate the effectiveness of our method.
引用
收藏
页码:4041 / 4050
页数:10
相关论文
共 50 条
  • [11] Stacked squeeze-and-excitation recurrent residual network for visual-semantic matching
    Wang, Haoran
    Ji, Zhong
    Lin, Zhigang
    Pang, Yanwei
    Li, Xuelong
    PATTERN RECOGNITION, 2020, 105 (105)
  • [12] Visual-Semantic Dual Channel Network for Visual Question Answering
    Wang, Xin
    Chen, Qiaohong
    Hu, Ting
    Sun, Qi
    Jia, Yubo
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [13] Visual-Semantic Graph Attention Networks for Human-Object Interaction Detection
    Liang, Zhijun
    Liu, Junfa
    Guan, Yisheng
    Rojas, Juan
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (IEEE-ROBIO 2021), 2021, : 1441 - 1447
  • [14] Visual-Semantic Transformer for Face Forgery Detection
    Xu, Yuting
    Jia, Gengyun
    Huang, Huaibo
    Duan, Junxian
    He, Ran
    2021 INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS (IJCB 2021), 2021,
  • [15] Ladder Loss for Coherent Visual-Semantic Embedding
    Zhou, Mo
    Niu, Zhenxing
    Wang, Le
    Gao, Zhanning
    Zhang, Qilin
    Hua, Gang
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13050 - 13057
  • [16] Image Captioning With Visual-Semantic Double Attention
    He, Chen
    Hu, Haifeng
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (01)
  • [17] Few-Shot Image and Sentence Matching via Gated Visual-Semantic Embedding
    Huang, Yan
    Long, Yang
    Wang, Liang
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8489 - 8496
  • [18] Automatic Cataract Grading with Visual-semantic Interpretability
    Xu, Xi
    Li, Jianqiang
    Guan, Yu
    Zhao, Linna
    Zhang, Li
    Li, Li
    2021 IEEE 45TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2021), 2021, : 1260 - 1264
  • [19] Deep Visual-semantic for Crowded Video Understanding
    Deng, Chunhua
    Zhang, Junwen
    MIPPR 2017: PATTERN RECOGNITION AND COMPUTER VISION, 2017, 10609
  • [20] Language-Agnostic Visual-Semantic Embeddings
    Wehrmann, Jonatas
    Souza, Douglas M.
    Lopes, Mauricio A.
    Barros, Rodrigo C.
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5803 - 5812