Spatial-temporal interaction learning based two-stream network for action recognition

被引:38
|
作者
Liu, Tianyu [1 ]
Ma, Yujun [2 ]
Yang, Wenhan [1 ]
Ji, Wanting [3 ]
Wang, Ruili [2 ]
Jiang, Ping [1 ]
机构
[1] Hunan Agr Univ, Coll Mech & Elect Engn, Changsha, Peoples R China
[2] Massey Univ, Sch Math & Computat Sci, Auckland, New Zealand
[3] Liaoning Univ, Sch Informat, Shenyang, Peoples R China
关键词
Action recognition; Spatial-temporal; Two-stream CNNs;
D O I
10.1016/j.ins.2022.05.092
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Two-stream convolutional neural networks have been widely applied to action recognition. However, two-stream networks are usually adopted to capture spatial information and temporal information separately, which normally ignore the strong complementarity and correlation between spatial and temporal information in videos. To solve this problem, we propose a Spatial-Temporal Interaction Learning Two-stream network (STILT) for action recognition. Our proposed two-stream (i.e., a spatial stream and a temporal stream) network has a spatial-temporal interaction learning module, which uses an alternating co attention mechanism between two streams to learn the correlation between spatial features and temporal features. The spatial-temporal interaction learning module allows the two streams to guide each other and then generates optimized spatial attention features and temporal attention features. Thus, the proposed network can establish the interactive connection between two streams, which efficiently exploits the attended spatial and temporal features to improve recognition accuracy. Experiments on three widely used datasets (i.e., UCF101, HMDB51 and Kinetics) show that the proposed network outperforms the state-of-the-art models in action recognition.(c) 2022 Elsevier Inc. All rights reserved.
引用
收藏
页码:864 / 876
页数:13
相关论文
共 50 条
  • [21] A two-stream network with joint spatial-temporal distance for video-based person re-identification
    Han, Zhisong
    Liang, Yaling
    Chen, Zengqun
    Zhou, Zhiheng
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (03) : 3769 - 3781
  • [22] Two-Stream Spatial Graphormer Networks for Skeleton-Based Action Recognition
    Li, Xiaolei
    Zhang, Junyou
    Wang, Shufeng
    Zhou, Qian
    IEEE ACCESS, 2022, 10 : 100426 - 100437
  • [23] Two-Stream Adaptive Weight Convolutional Neural Network Based on Spatial Attention for Human Action Recognition
    Chen, Guanzhou
    Yao, Lu
    Xu, Jingting
    Liu, Qianxi
    Chen, Shengyong
    INTELLIGENT ROBOTICS AND APPLICATIONS (ICIRA 2022), PT IV, 2022, 13458 : 319 - 330
  • [24] Human Action Recognition based on Two-Stream Ind Recurrent Neural Network
    Ge Penghua
    Zhi Min
    TENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2018), 2019, 11069
  • [25] A two-stream heterogeneous network for action recognition based on skeleton and RGB modalities
    Liu, Kai
    Gao, Lei
    Khan, Naimul Mefraz
    Qi, Lin
    Guan, Ling
    23RD IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2021), 2021, : 87 - 91
  • [26] Convolutional Two-Stream Network Fusion for Video Action Recognition
    Feichtenhofer, Christoph
    Pinz, Axel
    Zisserman, Andrew
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1933 - 1941
  • [27] Two-Stream Convolutional Neural Network for Video Action Recognition
    Qiao, Han
    Liu, Shuang
    Xu, Qingzhen
    Liu, Shouqiang
    Yang, Wanggan
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2021, 15 (10): : 3668 - 3684
  • [28] Two-Stream Deep Learning Architecture-Based Human Action Recognition
    Shehzad, Faheem
    Khan, Muhammad Attique
    Yar, Muhammad Asfand E.
    Sharif, Muhammad
    Alhaisoni, Majed
    Tariq, Usman
    Majumdar, Arnab
    Thinnukool, Orawit
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (03): : 5931 - 5949
  • [29] An Action Recognition Algorithm Based on Two-Stream Deep Learning for Metaverse Applications
    Liu, Jiayue
    Mao, Tianqi
    Huang, Yicheng
    He, Dongxuan
    20TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE, IWCMC 2024, 2024, : 639 - 642
  • [30] A Two-Stream Recurrent Network for Skeleton-based Human Interaction Recognition
    Men, Qianhui
    Ho, Edmond S. L.
    Shum, Hubert P. H.
    Leung, Howard
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 2771 - 2778