AMU-Net: an accurate motion perception and aggregation model for action recognition

被引：0

作者：

Zhang, Haitao ^{[1
]}

Xia, Ying ^{[1
]}

Feng, Jiangfan ^{[1
]}

机构：

[1] Chongqing Univ Posts & Telecommun, Sch Comp Sci & Technol, Chongqing, Peoples R China

来源：

JOURNAL OF ELECTRONIC IMAGING | 2024年 / 33卷 / 02期

基金：

中国国家自然科学基金;

关键词：

video action recognition; two-dimensional convolutional networks; temporal difference; motion noise; action graph; spatial sparsity of motion objects;

D O I：

10.1117/1.JEI.33.2.023053

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Motion information plays a vital role in video action recognition, serving as the fundamental building basis for the accurate interpretation of dynamic sequences. However, extracting accurate motion details remains a significant challenge for two-dimensional (2D) CNNs. To address this issue, we present an action recognition framework, named accurate motion understanding network (AMU-Net), designed to effectively perceive and aggregate valuable motion cues. Specifically, AMU-Net is a 2D CNN equipped with the proposed accurate motion perceptron (AMP) and action graph module (AGM). To capture finer local motion details, the AMP is introduced to handle motion noise in temporal differences. This module enables the extraction of critical local motion patterns from bidirectional temporal differences and enhances action-related features. Furthermore, to learn more precise global motion representations, the AGM is introduced to address the spatial sparsity of motion objects by detecting motion objects and selectively aggregating their features using a graph reasoning framework. Extensive experiments are conducted on three public benchmarks: ActivityNet-200, UCF-101, and Kinetics-400. Experimental results demonstrate that the proposed AMU-Net (based on ResNet-50) outperforms recent 2D CNN-based methods with a comparable computational overhead. In addition, the experimental results also show the effective transferability of the two modules to three popular lightweight convolutional architectures, emphasizing their versatility. (c) 2024 SPIE and IS&T

引用

页数：19

共 28 条

[1] MIAM: Motion information aggregation module for action recognition
Cheng, Qin
Ren, Ziliang
Liu, Zhen
Cheng, Jun
Zhang, Qieshi
Liu, Jianming
ELECTRONICS LETTERS, 2022, 58 (10) : 396 - 398
[2] A fast and accurate motion descriptor for human action recognition applications
Ghorbel, Enjie
Boutteau, Remi
Bonnaert, Jacques
Savatier, Xavier
Lecoeuche, Stephane
2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 919 - 924
[3] M2A: Motion Aware Attention for Accurate Video Action Recognition
Gebotys, Brennan
Wong, Alexander
Clausi, David A.
2022 19TH CONFERENCE ON ROBOTS AND VISION (CRV 2022), 2022, : 83 - 89
[4] Feature Aggregation Tree: Capture Temporal Motion Information for Action Recognition in Videos
Zhu, Bing
PATTERN RECOGNITION AND COMPUTER VISION, PT III, 2018, 11258 : 316 - 327
[5] Human action recognition method based on Motion Excitation and Temporal Aggregation module
Ye, Qing
Tan, Zexian
Zhang, Yongmei
HELIYON, 2022, 8 (11)
[6] Impaired biological motion perception and action recognition in children with autism spectrum disorder
Wang, Liang Hui
Chen, Tzu-Yun
Chen, Hsin-Shui
Chien, Sarina Hui-Lin
I-PERCEPTION, 2014, 5 (04): : 359 - 359
[7] Stme-net: spatio-temporal motion excitation network for action recognition
Zhao, Qian
Su, Yanxiong
Zhang, Hui
JOURNAL OF REAL-TIME IMAGE PROCESSING, 2025, 22 (02)
[8] Video Action Recognition Using Motion and Multi-View Excitation with Temporal Aggregation
Joefrie, Yuri Yudhaswana
Aono, Masaki
ENTROPY, 2022, 24 (11)
[9] Learning action recognition and implied motion-A neural model
Layher, G.
Neumann, H.
PERCEPTION, 2011, 40 : 176 - 177
[10] LM-Net: a dynamic gesture recognition network with long-term aggregation and motion excitation
Chang, Shaopeng
Huang, Xueyu
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (04) : 1633 - 1645

← 1 2 3 →