Improving Weakly Supervised Scene Graph Parsing through Object Grounding

被引:0
|
作者
Zhang, Yizhou [1 ]
Zheng, Zhaoheng [1 ]
Nevatia, Ram [1 ]
Liu, Yan [1 ]
机构
[1] Univ Southern Calif Angeles, Dept Comp Sci, Los Angeles, CA 90089 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Weakly supervised scene graph parsing, which learns structured image representations without annotated correspondences between graph nodes and visual objects, has been prevalent in recent computer vision research. Existing methods mainly focus on designing task-specific loss functions, model architectures, or optimization algorithms. We argue that correspondences between objects and graph nodes are crucial for the weakly supervised scene graph parsing task and are worth learning explicitly. Thus we propose GroParser, a framework that improves weakly supervised scene graph parsing models by grounding visual objects. The proposed weakly supervised grounding method learns a metric among visual objects and scene graph nodes by incorporating information from both object features and relational features. Specifically, we apply multi-instance learning to learn the object category information and exploit a two-stream graph neural network to model the relational similarity metric. Extensive experiments on the scene graph parsing task verify the grounding found by our model can reinforce the performance of the existing weakly supervised scene graph parsing methods, including the current state-of-the-art. Further experiments on Visual Genome (VG) and Visual Relation Detection (VRD) datasets verify that our model brings an improvement on scene graph grounding task over existing approaches.
引用
收藏
页码:4058 / 4064
页数:7
相关论文
共 50 条
  • [21] IMPROVING CLASS ACTIVATION MAP FOR WEAKLY SUPERVISED OBJECT LOCALIZATION
    Zhang, Zhenfei
    Chang, Ming-Ching
    But, Tien D.
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2624 - 2628
  • [22] Weakly supervised image parsing via label propagation over discriminatively semantic graph
    Xu, Xiaocheng
    Ma, Jun
    Nie, Liqiang
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2016, 40 : 808 - 815
  • [23] Flow Graph to Video Grounding for Weakly-Supervised Multi-step Localization
    Dvornik, Nikita
    Hadji, Isma
    Pham, Hai
    Bhatt, Dhaivat
    Martinez, Brais
    Fazly, Afsaneh
    Jepson, Allan D.
    COMPUTER VISION - ECCV 2022, PT XXXV, 2022, 13695 : 319 - 335
  • [24] Weakly-Supervised Video Object Grounding by Exploring Spatio-Temporal Contexts
    Yang, Xun
    Liu, Xueliang
    Jian, Meng
    Gao, Xinjian
    Wang, Meng
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1939 - 1947
  • [25] Improving weakly supervised phrase grounding via visual representation contextualization with contrastive learning
    Xue Wang
    Youtian Du
    Suzan Verberne
    Fons J. Verbeek
    Applied Intelligence, 2023, 53 : 14690 - 14702
  • [26] Improving weakly supervised phrase grounding via visual representation contextualization with contrastive learning
    Wang, Xue
    Du, Youtian
    Verberne, Suzan
    Verbeek, Fons J.
    APPLIED INTELLIGENCE, 2023, 53 (11) : 14690 - 14702
  • [27] Keypoint based weakly supervised human parsing
    Wu, Zhonghua
    Lin, Guosheng
    Cai, Jianfei
    IMAGE AND VISION COMPUTING, 2019, 91
  • [28] Weakly Supervised Semantic Parsing with Abstract Examples
    Goldman, Omer
    Latcinnik, Veronica
    Naveh, Udi
    Globerson, Amir
    Berant, Jonathan
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 1809 - 1819
  • [29] Iterative Search for Weakly Supervised Semantic Parsing
    Dasigi, Pradeep
    Gardner, Matt
    Murty, Shikhar
    Zettlemoyer, Luke
    Hovy, Eduard
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 2669 - 2680
  • [30] Weakly-Supervised Video Object Grounding via Learning Uni-Modal Associations
    Wang, Wei
    Gao, Junyu
    Xu, Changsheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 6329 - 6340