Fine-Grained Spatiotemporal Motion Alignment for Contrastive Video Representation Learning

被引:2
|
作者
Zhu, Minghao [1 ]
Lin, Xiao [1 ]
Dang, Ronghao [1 ]
Liu, Chengju [1 ]
Chen, Qijun [1 ]
机构
[1] Tongji Univ, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
Self-supervised Learning; Action Recognition;
D O I
10.1145/3581783.3611932
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As the most essential property in a video, motion information is critical to a robust and generalized video representation. To inject motion dynamics, recent works have adopted frame difference as the source of motion information in video contrastive learning, considering the trade-off between quality and cost. However, existing works align motion features at the instance level, which suffers from spatial and temporal weak alignment across modalities. In this paper, we present a Fine-grained Motion Alignment (FIMA) framework, capable of introducing well-aligned and significant motion information. Specifically, we first develop a dense contrastive learning framework in the spatiotemporal domain to generate pixel-level motion supervision. Then, we design a motion decoder and a foreground sampling strategy to eliminate the weak alignments in terms of time and space. Moreover, a frame-level motion contrastive loss is presented to improve the temporal diversity of the motion features. Extensive experiments demonstrate that the representations learned by FIMA possess great motion-awareness capabilities and achieve state-of-the-art or competitive results on downstream tasks across UCF101, HMDB51, and Diving48 datasets. Code is available at https://github.com/ZMHH- H/FIMA.
引用
收藏
页码:4725 / 4736
页数:12
相关论文
共 50 条
  • [31] Fine-grained Object Recognition via Pose Alignment and Part based Representation
    Chen, Shuxian
    Liu, Jianming
    NINTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2017), 2017, 10420
  • [32] Cross-modality motion parameterization for fine-grained video prediction
    Yan, Yichao
    Ni, Bingbing
    Zhang, Wendong
    Tang, Jun
    Yang, Xiaokang
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2019, 183 : 11 - 19
  • [33] VIDEO-MUSIC RETRIEVAL WITH FINE-GRAINED CROSS-MODAL ALIGNMENT
    Era, Yuki
    Togo, Ren
    Maeda, Keisuke
    Ogawa, Takahiro
    Haseyama, Miki
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2005 - 2009
  • [34] Instance Switching-Based Contrastive Learning for Fine-Grained Airplane Detection
    Zeng, Lanxin
    Guo, Haowen
    Yang, Wen
    Yu, Huai
    Yu, Lei
    Zhang, Peng
    Zou, Tongyuan
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [35] Method based on contrastive incremental learning for fine-grained malicious traffic classification
    Wang Y.
    Guo Y.
    Chen Q.
    Fang C.
    Lin R.
    Zhou Y.
    Ma J.
    Tongxin Xuebao/Journal on Communications, 2023, 44 (03): : 1 - 11
  • [36] Attention-based supervised contrastive learning on fine-grained image classification
    Li, Qian
    Wu, Weining
    PATTERN ANALYSIS AND APPLICATIONS, 2024, 27 (03)
  • [37] Method based on contrastive learning for fine-grained unknown malicious traffic classification
    Wang Y.
    Guo Y.
    Chen Q.
    Fang C.
    Lin R.
    Tongxin Xuebao/Journal on Communications, 2022, 43 (10): : 12 - 25
  • [38] A Novel Multiscale Contrastive Learning Network for Fine-Grained Ocean Ship Classification
    Dong, Shaokang
    Feng, Jiangfan
    Fang, Dongxu
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 9989 - 10005
  • [39] Fine-Grained Aircraft Recognition Based on Dynamic Feature Synthesis and Contrastive Learning
    Wan, Huiyao
    Nurmamat, Pazlat
    Chen, Jie
    Cao, Yice
    Wang, Shuai
    Zhang, Yan
    Huang, Zhixiang
    REMOTE SENSING, 2025, 17 (05)
  • [40] Improve Fine-Grained Feature Learning in Fine-Grained DataSet GAI
    Wang, Hai Peng
    Geng, Zhi Qing
    IEEE ACCESS, 2025, 13 : 12777 - 12788