Spatial-temporal interaction learning based two-stream network for action recognition

被引:38
|
作者
Liu, Tianyu [1 ]
Ma, Yujun [2 ]
Yang, Wenhan [1 ]
Ji, Wanting [3 ]
Wang, Ruili [2 ]
Jiang, Ping [1 ]
机构
[1] Hunan Agr Univ, Coll Mech & Elect Engn, Changsha, Peoples R China
[2] Massey Univ, Sch Math & Computat Sci, Auckland, New Zealand
[3] Liaoning Univ, Sch Informat, Shenyang, Peoples R China
关键词
Action recognition; Spatial-temporal; Two-stream CNNs;
D O I
10.1016/j.ins.2022.05.092
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Two-stream convolutional neural networks have been widely applied to action recognition. However, two-stream networks are usually adopted to capture spatial information and temporal information separately, which normally ignore the strong complementarity and correlation between spatial and temporal information in videos. To solve this problem, we propose a Spatial-Temporal Interaction Learning Two-stream network (STILT) for action recognition. Our proposed two-stream (i.e., a spatial stream and a temporal stream) network has a spatial-temporal interaction learning module, which uses an alternating co attention mechanism between two streams to learn the correlation between spatial features and temporal features. The spatial-temporal interaction learning module allows the two streams to guide each other and then generates optimized spatial attention features and temporal attention features. Thus, the proposed network can establish the interactive connection between two streams, which efficiently exploits the attended spatial and temporal features to improve recognition accuracy. Experiments on three widely used datasets (i.e., UCF101, HMDB51 and Kinetics) show that the proposed network outperforms the state-of-the-art models in action recognition.(c) 2022 Elsevier Inc. All rights reserved.
引用
收藏
页码:864 / 876
页数:13
相关论文
共 50 条
  • [41] Spiking two-stream methods with unsupervised STDP-based learning for action recognition
    El-Assal, Mireille
    Tirilly, Pierre
    Bilasco, Ioan Marius
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2025, 134
  • [42] Two-Stream Temporal Feature Aggregation Based on Clustering for Few-Shot Action Recognition
    Deng, Long
    Li, Ao
    Zhou, Bingxin
    Ge, Yongxin
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2435 - 2439
  • [43] Transferable two-stream convolutional neural network for human action recognition
    Xiong, Qianqian
    Zhang, Jianjing
    Wang, Peng
    Liu, Dongdong
    Gao, Robert X.
    JOURNAL OF MANUFACTURING SYSTEMS, 2020, 56 : 605 - 614
  • [44] Two-stream temporal enhanced Fisher vector encoding for skeleton-based action recognition
    Jun Tang
    Baodi Liu
    Wenhui Guo
    Yanjiang Wang
    Complex & Intelligent Systems, 2023, 9 : 3147 - 3159
  • [45] Spatial-temporal saliency action mask attention network for action recognition
    Jiang, Min
    Pan, Na
    Kong, Jun
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2020, 71
  • [46] Two-stream temporal enhanced Fisher vector encoding for skeleton-based action recognition
    Tang, Jun
    Liu, Baodi
    Guo, Wenhui
    Wang, Yanjiang
    COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (03) : 3147 - 3159
  • [47] Learning Heterogeneous Spatial-Temporal Context for Skeleton-Based Action Recognition
    Gao, Xuehao
    Yang, Yang
    Wu, Yang
    Du, Shaoyi
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (09) : 12130 - 12141
  • [48] Multi-stream adaptive spatial-temporal attention graph convolutional network for skeleton-based action recognition
    Yu, Lubin
    Tian, Lianfang
    Du, Qiliang
    Bhutto, Jameel Ahmed
    IET COMPUTER VISION, 2022, 16 (02) : 143 - 158
  • [49] Efficient Two-stream Action Recognition on FPGA
    Lin, Jia-Ming
    Lai, Kuan-Ting
    Wu, Bin-Ray
    Chen, Ming-Syan
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3070 - 3074
  • [50] 3 s-STNet: three-stream spatial-temporal network with appearance and skeleton information learning for action recognition
    Fang, Ming
    Peng, Siyu
    Zhao, Yang
    Yuan, Haibo
    Hung, Chih-Cheng
    Liu, Shuhua
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (02): : 1835 - 1848