Improving Weakly Supervised Scene Graph Parsing through Object Grounding

被引:0
|
作者
Zhang, Yizhou [1 ]
Zheng, Zhaoheng [1 ]
Nevatia, Ram [1 ]
Liu, Yan [1 ]
机构
[1] Univ Southern Calif Angeles, Dept Comp Sci, Los Angeles, CA 90089 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Weakly supervised scene graph parsing, which learns structured image representations without annotated correspondences between graph nodes and visual objects, has been prevalent in recent computer vision research. Existing methods mainly focus on designing task-specific loss functions, model architectures, or optimization algorithms. We argue that correspondences between objects and graph nodes are crucial for the weakly supervised scene graph parsing task and are worth learning explicitly. Thus we propose GroParser, a framework that improves weakly supervised scene graph parsing models by grounding visual objects. The proposed weakly supervised grounding method learns a metric among visual objects and scene graph nodes by incorporating information from both object features and relational features. Specifically, we apply multi-instance learning to learn the object category information and exploit a two-stream graph neural network to model the relational similarity metric. Extensive experiments on the scene graph parsing task verify the grounding found by our model can reinforce the performance of the existing weakly supervised scene graph parsing methods, including the current state-of-the-art. Further experiments on Visual Genome (VG) and Visual Relation Detection (VRD) datasets verify that our model brings an improvement on scene graph grounding task over existing approaches.
引用
收藏
页码:4058 / 4064
页数:7
相关论文
共 50 条
  • [1] vtGraphNet: Learning weakly-supervised scene graph for complex visual grounding
    Lyu, Fan
    Feng, Wei
    Wang, Song
    NEUROCOMPUTING, 2020, 413 : 51 - 60
  • [2] Weakly-Supervised Video Scene Co-parsing
    Zhong, Guangyu
    Tsai, Yi-Hsuan
    Yang, Ming-Hsuan
    COMPUTER VISION - ACCV 2016, PT I, 2017, 10111 : 20 - 36
  • [3] Hierarchical Scene Parsing by Weakly Supervised Learning with Image Descriptions
    Zhang, Ruimao
    Lin, Liang
    Wang, Guangrun
    Wang, Meng
    Zuo, Wangmeng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (03) : 596 - 610
  • [4] Weakly-supervised scene parsing with multiple contextual cues
    Li, Teng
    Wu, Xinyu
    Ni, Bingbing
    Lu, Ke
    Yan, Shuicheng
    INFORMATION SCIENCES, 2015, 323 : 59 - 72
  • [5] Semantic Graph Construction for Weakly-Supervised Image Parsing
    Xie, Wenxuan
    Peng, Yuxin
    Xiao, Jianguo
    PROCEEDINGS OF THE TWENTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2014, : 2853 - 2859
  • [6] Weakly Supervised Graph Propagation Towards Collective Image Parsing
    Liu, Si
    Yan, Shuicheng
    Zhang, Tianzhu
    Xu, Changsheng
    Liu, Jing
    Lu, Hanqing
    IEEE TRANSACTIONS ON MULTIMEDIA, 2012, 14 (02) : 361 - 373
  • [7] WEAKLY SUPERVISED IMAGE PARSING BY DISCRIMINATIVELY SEMANTIC GRAPH PROPAGATION
    Xu, Xiaocheng
    Ma, Jun
    2016 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO (ICME), 2016,
  • [8] Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation
    Wang, Liwei
    Huang, Jing
    Li, Yin
    Xu, Kun
    Yang, Zhengyuan
    Yu, Dong
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 14085 - 14095
  • [9] Integrating Object-aware and Interaction-aware Knowledge for Weakly Supervised Scene Graph Generation
    Li, Xingchen
    Chen, Long
    Ma, Wenbo
    Yang, Yi
    Xiao, Jun
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4204 - 4213
  • [10] A discriminative graph inferring framework towards weakly supervised image parsing
    Yu, Lei
    Bao, Bing-Kun
    Xu, Changsheng
    MULTIMEDIA SYSTEMS, 2017, 23 (01) : 5 - 18