Repeat and learn: Self-supervised visual representations learning by Scene Localization

被引:1
|
作者
Altabrawee, Hussein [1 ,2 ]
Noor, Mohd Halim Mohd [1 ]
机构
[1] Univ Sains Malaysia, Sch Comp Sci, Main Campus, Gelugor 11800, Penang, Malaysia
[2] Al Muthanna Univ, Comp Ctr, Main Campus, Samawah 66001, Al Muthanna, Iraq
关键词
Visual representations learning; Action recognition; Self-supervised learning;
D O I
10.1016/j.patcog.2024.110804
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large labeled datasets are crucial for video understanding progress. However, the labeling process is timeconsuming, expensive, and tiresome. To overcome this impediment, various pretexts use the temporal coherence in videos to learn visual representations in a self-supervised manner. However, these pretexts (order verification and sequence sorting) struggle when encountering cyclic actions due to the label ambiguity problem. To overcome these limitations, we present a novel temporal pretext task to address self-supervised learning of visual representations from unlabeled videos. Repeated Scene Localization (RSL) is a multi-class classification pretext that involves changing the temporal order of the frames in a video by repeating a scene. Then, the network is trained to identify the modified video, localize the location of the repeated scene, and identify the unmodified original videos that do not have repeated scenes. We evaluated the proposed pretext on two benchmark datasets, UCF-101 and HMDB-51. The experimental results show that the proposed pretext achieves state-of-the-art results in action recognition and video retrieval tasks. In action recognition, our S3D model achieves 88.15% and 56.86% on UCF-101 and HMDB-51, respectively. It outperforms the current state-of-the-art by 1.05% and 3.26%. Our R(2+1)D-Adjacent model achieves 83.52% and 54.50% on UCF-101 and HMDB-51, respectively. It outperforms the single pretext tasks by 8.7% and 13.9%. In video retrieval, our R(2+1)D-Offset model outperforms the single pretext tasks by 4.68% and 1.1% Top 1 accuracies on UCF-101 and HMDB-51, respectively. The source code and the trained models are publicly available at https://github.com/Hussein-A-Hassan/RSL-Pretext.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] OCEAN: Object-centric arranging network for self-supervised visual representations learning
    Oh, Changjae
    Ham, Bumsub
    Kim, Hansung
    Hilton, Adrian
    Sohn, Kwanghoon
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 125 : 281 - 292
  • [22] Self-Supervised Dense Visual Representation Learning
    Ozcelik, Timoteos Onur
    Gokberk, Berk
    Akarun, Lale
    32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,
  • [23] Self-supervised Learning of Visual Graph Matching
    Liu, Chang
    Zhang, Shaofeng
    Yang, Xiaokang
    Yan, Junchi
    COMPUTER VISION, ECCV 2022, PT XXIII, 2022, 13683 : 370 - 388
  • [24] Revisiting Self-Supervised Visual Representation Learning
    Kolesnikov, Alexander
    Zhai, Xiaohua
    Beyer, Lucas
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1920 - 1929
  • [25] Contrast and Order Representations for Video Self-supervised Learning
    Hu, Kai
    Shao, Jie
    Liu, Yuan
    Raj, Bhiksha
    Savvides, Marios
    Shen, Zhiqiang
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7919 - 7929
  • [26] Self-supervised graph representations with generative adversarial learning
    Sun, Xuecheng
    Wang, Zonghui
    Lu, Zheming
    Lu, Ziqian
    NEUROCOMPUTING, 2024, 592
  • [27] Self-supervised learning of Dynamic Representations for Static Images
    Song, Siyang
    Sanchez, Enrique
    Shen, Linlin
    Valstar, Michel
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 1619 - 1626
  • [28] Deep Bregman divergence for self-supervised representations learning
    Rezaei, Mina
    Soleymani, Farzin
    Bischl, Bernd
    Azizi, Shekoofeh
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 235
  • [29] Self-Supervised Learning of Pretext-Invariant Representations
    Misra, Ishan
    van der Maaten, Laurens
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 6706 - 6716
  • [30] Improving Self-Supervised Learning by Characterizing Idealized Representations
    Dubois, Yann
    Hashimoto, Tatsunori
    Ermon, Stefano
    Liang, Percy
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,