AMU-Net: an accurate motion perception and aggregation model for action recognition

被引：0

作者：

Zhang, Haitao ^{[1
]}

Xia, Ying ^{[1
]}

Feng, Jiangfan ^{[1
]}

机构：

[1] Chongqing Univ Posts & Telecommun, Sch Comp Sci & Technol, Chongqing, Peoples R China

来源：

JOURNAL OF ELECTRONIC IMAGING | 2024年 / 33卷 / 02期

基金：

中国国家自然科学基金;

关键词：

video action recognition; two-dimensional convolutional networks; temporal difference; motion noise; action graph; spatial sparsity of motion objects;

D O I：

10.1117/1.JEI.33.2.023053

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Motion information plays a vital role in video action recognition, serving as the fundamental building basis for the accurate interpretation of dynamic sequences. However, extracting accurate motion details remains a significant challenge for two-dimensional (2D) CNNs. To address this issue, we present an action recognition framework, named accurate motion understanding network (AMU-Net), designed to effectively perceive and aggregate valuable motion cues. Specifically, AMU-Net is a 2D CNN equipped with the proposed accurate motion perceptron (AMP) and action graph module (AGM). To capture finer local motion details, the AMP is introduced to handle motion noise in temporal differences. This module enables the extraction of critical local motion patterns from bidirectional temporal differences and enhances action-related features. Furthermore, to learn more precise global motion representations, the AGM is introduced to address the spatial sparsity of motion objects by detecting motion objects and selectively aggregating their features using a graph reasoning framework. Extensive experiments are conducted on three public benchmarks: ActivityNet-200, UCF-101, and Kinetics-400. Experimental results demonstrate that the proposed AMU-Net (based on ResNet-50) outperforms recent 2D CNN-based methods with a comparable computational overhead. In addition, the experimental results also show the effective transferability of the two modules to three popular lightweight convolutional architectures, emphasizing their versatility. (c) 2024 SPIE and IS&T

引用

页数：19

共 28 条

[21] Do children with autism spectrum disorders exhibit biological motion perception deficits? Evidence using an action recognition paradigm
Karuppali, Sudhin
JOURNAL OF INDIAN ASSOCIATION FOR CHILD AND ADOLESCENT MENTAL HEALTH, 2018, 14 (04): : 38 - 57
[22] Global motion estimation with iterative optimization-based independent univariate model for action recognition
Wu, Lifang
Yang, Zhou
Jian, Meng
Shen, Jialie
Yang, Yuchen
Lang, Xianglong
PATTERN RECOGNITION, 2021, 116
[23] Accurate Description of Protein-Protein Recognition and Protein Aggregation with the Implicit-Solvent-Based PACSAB Protein Model
Emperador, Agusti
POLYMERS, 2021, 13 (23)
[24] 3D Motion Trail Model based Pyramid Histograms of Oriented Gradient for Action Recognition
Liang, Bin
Zheng, Lihong
2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 1952 - 1957
[25] Action Recognition with a Bio-inspired Feedforward Motion Processing Model: The Richness of Center-Surround Interactions
Escobar, Maria-Jose
Kornprobst, Pierre
COMPUTER VISION - ECCV 2008, PT IV, PROCEEDINGS, 2008, 5305 : 186 - 199
[26] LEARNING SHAPE-MOTION REPRESENTATIONS FROM GEOMETRIC ALGEBRA SPATIO-TEMPORAL MODEL FOR SKELETON-BASED ACTION RECOGNITION
Li, Yanshan
Xia, Rongjie
Liu, Xing
Huang, Qinghua
2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1066 - 1071
[27] DAssd-Net: A Lightweight Steel Surface Defect Detection Model Based on Multi-Branch Dilated Convolution Aggregation and Multi-Domain Perception Detection Head
Wang, Ji
Xu, Peiquan
Li, Leijun
Zhang, Feng
SENSORS, 2023, 23 (12)
[28] A graph convolutional neural network model with Fisher vector encoding and channel-wise spatial-temporal aggregation for skeleton-based action recognition
Tang, Jun
Wang, Yanjiang
Fu, Sichao
Liu, Baodi
Liu, Weifeng
IET IMAGE PROCESSING, 2022, 16 (05) : 1433 - 1443

← 1 2 3 →