Weakly-supervised learning of visual relations

被引:106
|
作者
Peyre, Julia [1 ,2 ]
Laptev, Ivan [1 ,2 ]
Schmid, Cordelia [2 ,4 ]
Sivic, Josef [1 ,2 ,3 ]
机构
[1] PSL Res Univ, ENS, CNRS, Dept Informat, F-75005 Paris, France
[2] INRIA, Paris, France
[3] Czech Tech Univ, Czech Inst Informat Robot & Cybernet, Prague, Czech Republic
[4] Univ Grenoble Alpes, INRIA, CNRS, Grenoble INP,LJK, F-38000 Grenoble, France
基金
欧洲研究理事会;
关键词
D O I
10.1109/ICCV.2017.554
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces a novel approach for modeling visual relations between pairs of objects. We call relation a triplet of the form (subject, predicate, object) where the predicate is typically a preposition (eg. 'under', 'in front of') or a verb ('hold', 'ride') that links a pair of objects (subject, object). Learning such relations is challenging as the objects have different spatial configurations and appearances depending on the relation in which they occur. Another major challenge comes from the difficulty to get annotations, especially at box-level, for all possible triplets, which makes both learning and evaluation difficult. The contributions of this paper are threefold. First, we design strong yet flexible visual features that encode the appearance and spatial configuration for pairs of objects. Second, we propose a weakly-supervised discriminative clustering model to learn relations from image-level labels only. Third we introduce a new challenging dataset of unusual relations (UnRel) together with an exhaustive annotation, that enables accurate evaluation of visual relation retrieval. We show experimentally that our model results in state-of-the-art results on the visual relationship dataset [32] significantly improving performance on previously unseen relations (zero-shot learning), and confirm this observation on our newly introduced UnRel dataset.
引用
收藏
页码:5189 / 5198
页数:10
相关论文
共 50 条
  • [1] Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining
    Bugliarello, Emanuele
    Nematzadeh, Aida
    Hendricks, Lisa Anne
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3052 - 3071
  • [2] Learning Visual Words for Weakly-Supervised Semantic Segmentation
    Ru, Lixiang
    Du, Bo
    Wu, Chen
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 982 - 988
  • [3] Weakly-Supervised Cross-Domain Dictionary Learning for Visual Recognition
    Zhu, Fan
    Shao, Ling
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2014, 109 (1-2) : 42 - 59
  • [4] Multiple instance deep learning for weakly-supervised visual object tracking
    Huang, Kaining
    Shi, Yan
    Zhao, Fuqi
    Zhang, Zijun
    Tu, Shanshan
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2020, 84
  • [5] vtGraphNet: Learning weakly-supervised scene graph for complex visual grounding
    Lyu, Fan
    Feng, Wei
    Wang, Song
    NEUROCOMPUTING, 2020, 413 : 51 - 60
  • [6] Weakly-Supervised Semantic Segmentation with Visual Words Learning and Hybrid Pooling
    Ru, Lixiang
    Du, Bo
    Zhan, Yibing
    Wu, Chen
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (04) : 1127 - 1144
  • [7] Weakly-Supervised Semantic Segmentation with Visual Words Learning and Hybrid Pooling
    Lixiang Ru
    Bo Du
    Yibing Zhan
    Chen Wu
    International Journal of Computer Vision, 2022, 130 : 1127 - 1144
  • [8] Weakly-Supervised Cross-Domain Dictionary Learning for Visual Recognition
    Fan Zhu
    Ling Shao
    International Journal of Computer Vision, 2014, 109 : 42 - 59
  • [9] Local Boosting for Weakly-Supervised Learning
    Zhang, Rongzhi
    Yu, Yue
    Shen, Jiaming
    Cui, Xiquan
    Zhang, Chao
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 3364 - 3375
  • [10] Weakly-Supervised Audio-Visual Segmentation
    Mo, Shentong
    Raj, Bhiksha
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,