Fine-Grained Spatiotemporal Motion Alignment for Contrastive Video Representation Learning

被引:2
|
作者
Zhu, Minghao [1 ]
Lin, Xiao [1 ]
Dang, Ronghao [1 ]
Liu, Chengju [1 ]
Chen, Qijun [1 ]
机构
[1] Tongji Univ, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
Self-supervised Learning; Action Recognition;
D O I
10.1145/3581783.3611932
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As the most essential property in a video, motion information is critical to a robust and generalized video representation. To inject motion dynamics, recent works have adopted frame difference as the source of motion information in video contrastive learning, considering the trade-off between quality and cost. However, existing works align motion features at the instance level, which suffers from spatial and temporal weak alignment across modalities. In this paper, we present a Fine-grained Motion Alignment (FIMA) framework, capable of introducing well-aligned and significant motion information. Specifically, we first develop a dense contrastive learning framework in the spatiotemporal domain to generate pixel-level motion supervision. Then, we design a motion decoder and a foreground sampling strategy to eliminate the weak alignments in terms of time and space. Moreover, a frame-level motion contrastive loss is presented to improve the temporal diversity of the motion features. Extensive experiments demonstrate that the representations learned by FIMA possess great motion-awareness capabilities and achieve state-of-the-art or competitive results on downstream tasks across UCF101, HMDB51, and Diving48 datasets. Code is available at https://github.com/ZMHH- H/FIMA.
引用
收藏
页码:4725 / 4736
页数:12
相关论文
共 50 条
  • [1] GRAPH FINE-GRAINED CONTRASTIVE REPRESENTATION LEARNING
    Tang, Hui
    Liang, Xun
    Guo, Yuhui
    Zheng, Xiangping
    Wu, Bo
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3478 - 3482
  • [2] Modeling Video as Stochastic Processes for Fine-Grained Video Representation Learning
    Zhang, Heng
    Liu, Daqing
    Zheng, Qi
    Su, Bing
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2225 - 2234
  • [3] Spatiotemporal Contrastive Video Representation Learning
    Qian, Rui
    Meng, Tianjian
    Gong, Boqing
    Yang, Ming-Hsuan
    Wang, Huisheng
    Belongie, Serge
    Cui, Yin
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 6960 - 6970
  • [4] MODERNN: TOWARDS FINE-GRAINED MOTION DETAILS FOR SPATIOTEMPORAL PREDICTIVE LEARNING
    Chai, Zenghao
    Xu, Zhengzhuo
    Yuan, Chun
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4658 - 4662
  • [5] Fine-grained Angular Contrastive Learning with Coarse Labels
    Bukchin, Guy
    Schwartz, Eli
    Saenko, Kate
    Shahar, Ori
    Feris, Rogerio
    Giryes, Raja
    Karlinsky, Leonid
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 8726 - 8736
  • [6] Fine-Grained Contrastive Learning for Pulmonary Nodule Classification
    Zheng, Yubin
    Tang, Peng
    Ju, Tianjie
    Qiu, Weidong
    Yan, Bo
    Proceedings of the International Joint Conference on Neural Networks, 2024,
  • [7] Fine-Grained Semantics Enhanced Contrastive Learning for Graphs
    Liu, Youming
    Shu, Lin
    Chen, Chuan
    Zheng, Zibin
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (12) : 8238 - 8250
  • [8] Fine-grained and coarse-grained contrastive learning for text classification
    Zhang, Shaokang
    Ran, Ning
    NEUROCOMPUTING, 2024, 596
  • [9] Fine-grained representation learning in convolutional autoencoders
    Luo, Chang
    Wang, Jie
    JOURNAL OF ELECTRONIC IMAGING, 2016, 25 (02)
  • [10] Representation Learning for Fine-Grained Change Detection
    O'Mahony, Niall
    Campbell, Sean
    Krpalkova, Lenka
    Carvalho, Anderson
    Walsh, Joseph
    Riordan, Daniel
    SENSORS, 2021, 21 (13)