Joint spatial-temporal attention for action recognition

被引:25
|
作者
Yu, Tingzhao [1 ,2 ]
Guo, Chaoxu [1 ,2 ]
Wang, Lingfeng [1 ]
Gu, Huxiang [1 ]
Xiang, Shiming [1 ]
Pan, Chunhong [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Comp & Control Engn, Beijing 101408, Peoples R China
基金
中国国家自然科学基金;
关键词
Action recognition; Spatial-Temporal attention; Two-Stage; REPRESENTATION;
D O I
10.1016/j.patrec.2018.07.034
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel high-level action representation using joint spatial-temporal attention model, with application to video-based human action recognition. Specifically, to extract robust motion representations of videos, a new spatial attention module based on 3D convolution is proposed, which can pay attention to the salient parts of the spatial areas. For better dealing with long-duration videos, a new bidirectional LSTM based temporal attention module is introduced, which aims to focus on the key video cubes instead of the key video frames of a given video. The spatial-temporal attention network can be jointly trained via a two-stage strategy, which enables us to simultaneously explore the correlation both in spatial and temporal domain. Experimental results on benchmark action recognition datasets demonstrate the effectiveness of our network. (c) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:226 / 233
页数:8
相关论文
共 50 条
  • [41] Spatial-Temporal Action Localization With Hierarchical Self-Attention
    Pramono, Rizard Renanda Adhi
    Chen, Yie-Tarng
    Fang, Wen-Hsien
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 625 - 639
  • [42] Integrating Temporal and Spatial Attention for Video Action Recognition
    Zhou, Yuanding
    Li, Baopu
    Wang, Zhihui
    Li, Haojie
    SECURITY AND COMMUNICATION NETWORKS, 2022, 2022
  • [43] Spatial-Temporal Self-Attention Enhanced Graph Convolutional Networks for Fitness Yoga Action Recognition
    Wei, Guixiang
    Zhou, Huijian
    Zhang, Liping
    Wang, Jianji
    SENSORS, 2023, 23 (10)
  • [44] Beyond coordinate attention: spatial-temporal recalibration and channel scaling for skeleton-based action recognition
    Tang, Jun
    Gong, Sihang
    Wang, Yanjiang
    Liu, Baodi
    Du, Chunyu
    Gu, Boyang
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (01) : 199 - 206
  • [45] Beyond coordinate attention: spatial-temporal recalibration and channel scaling for skeleton-based action recognition
    Jun Tang
    Sihang Gong
    Yanjiang Wang
    Baodi Liu
    Chunyu Du
    Boyang Gu
    Signal, Image and Video Processing, 2024, 18 : 199 - 206
  • [46] Improved SSD using deep multi-scale attention spatial-temporal features for action recognition
    Zhou, Shuren
    Qiu, Jia
    Solanki, Arun
    MULTIMEDIA SYSTEMS, 2022, 28 (06) : 2123 - 2131
  • [47] Hierarchy Spatial-Temporal Transformer for Action Recognition in Short Videos
    Cai, Guoyong
    Cai, Yumeng
    FUZZY SYSTEMS AND DATA MINING VI, 2020, 331 : 760 - 774
  • [48] Action Recognition Based on Spatial-Temporal Pyramid Sparse Coding
    Zhang, Xiaojing
    Zhang, Hua
    Cao, Xiaochun
    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 1455 - 1458
  • [49] Hierarchical Spatial-Temporal Masked Contrast for Skeleton Action Recognition
    Cao, Wenming
    Zhang, Aoyu
    He, Zhihai
    Zhang, Yicha
    Yin, Xinpeng
    IEEE Transactions on Artificial Intelligence, 2024, 5 (11): : 5801 - 5814
  • [50] Multi-Branch Spatial-Temporal Network for Action Recognition
    Wang, Yingying
    Li, Wei
    Tao, Ran
    IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (10) : 1556 - 1560