Fine-Grained Spatiotemporal Motion Alignment for Contrastive Video Representation Learning

被引：2

作者：

Zhu, Minghao ^{[1
]}

Lin, Xiao ^{[1
]}

Dang, Ronghao ^{[1
]}

Liu, Chengju ^{[1
]}

Chen, Qijun ^{[1
]}

机构：

[1] Tongji Univ, Shanghai, Peoples R China

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

Self-supervised Learning; Action Recognition;

D O I：

10.1145/3581783.3611932

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As the most essential property in a video, motion information is critical to a robust and generalized video representation. To inject motion dynamics, recent works have adopted frame difference as the source of motion information in video contrastive learning, considering the trade-off between quality and cost. However, existing works align motion features at the instance level, which suffers from spatial and temporal weak alignment across modalities. In this paper, we present a Fine-grained Motion Alignment (FIMA) framework, capable of introducing well-aligned and significant motion information. Specifically, we first develop a dense contrastive learning framework in the spatiotemporal domain to generate pixel-level motion supervision. Then, we design a motion decoder and a foreground sampling strategy to eliminate the weak alignments in terms of time and space. Moreover, a frame-level motion contrastive loss is presented to improve the temporal diversity of the motion features. Extensive experiments demonstrate that the representations learned by FIMA possess great motion-awareness capabilities and achieve state-of-the-art or competitive results on downstream tasks across UCF101, HMDB51, and Diving48 datasets. Code is available at https://github.com/ZMHH- H/FIMA.

引用

页码：4725 / 4736

页数：12

共 50 条

[41] Partial-Label Contrastive Representation Learning for Fine-Grained Biomarkers Prediction From Histopathology Whole Slide Images
Zheng, Yushan
Wu, Kun
Li, Jun
Tang, Kunming
Shi, Jun
Wu, Haibo
Jiang, Zhiguo
Wang, Wei
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2025, 29 (01) : 396 - 408
[42] Fine-grained Audible Video Description
Shen, Xuyang
Li, Dong
Zhou, Jinxing
Qin, Zhen
He, Bowen
Han, Xiaodong
Li, Aixuan
Dai, Yuchao
Kong, Lingpeng
Wang, Meng
Qiao, Yu
Zhong, Yiran
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10585 - 10596
[43] Fine-Grained Scalable Video Caching
Gong, Qiushi
Woods, John W.
Kar, Koushik
Chakareski, Jacob
2015 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2015, : 101 - 106
[44] JOINT LEARNING ON THE HIERARCHY REPRESENTATION FOR FINE-GRAINED HUMAN ACTION RECOGNITION
Leong, Mei Chee
Tan, Hui Li
Zhang, Haosong
Li, Liyuan
Lin, Feng
Lim, Joo Hwee
2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 1059 - 1063
[45] DeepFirearm: Learning Discriminative Feature Representation for Fine-grained Firearm Retrieval
Hao, Jiedong
Dong, Jing
Wang, Wei
Tan, Tieniu
2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 3335 - 3340
[46] Fine-Grained Representation Learning and Recognition by Exploiting Hierarchical Semantic Embedding
Chen, Tianshui
Wu, Wenxi
Gao, Yuefang
Dong, Le
Luo, Xiaonan
Lin, Liang
PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 2023 - 2031
[47] LEARNING DEEP AND SPARSE FEATURE REPRESENTATION FOR FINE-GRAINED OBJECT RECOGNITION
Srinivas, M.
Lin, Yen-Yu
Liao, Hong-Yuan Mark
2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 1458 - 1463
[48] Fine-Grained Early Frequency Attention for Deep Speaker Representation Learning
Hajavi A.
Etemad A.
IEEE Transactions on Artificial Intelligence, 2023, 4 (06): : 1413 - 1425
[49] Attribute-Aware Attention Model for Fine-grained Representation Learning
Han, Kai
Guo, Jianyuan
Zhang, Chao
Zhu, Mingjian
PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 2040 - 2048
[50] Fine-grained cybersecurity entity typing based on multimodal representation learning
Wang, Baolei
Zhang, Xuan
Wang, Jishu
Gao, Chen
Duan, Qing
Li, Linyu
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (10) : 30207 - 30232

← 1 2 3 4 5 →