A Hybrid Network for Large-Scale Action Recognition from RGB and Depth Modalities

被引:21
|
作者
Wang, Huogen [1 ,2 ]
Song, Zhanjie [3 ]
Li, Wanqing [2 ]
Wang, Pichao [4 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
[2] Univ Wollongong, Adv Multimedia Res Lab, Wollongong, NSW 2522, Australia
[3] Tianjin Univ, Sch Math, Tianjin 300350, Peoples R China
[4] Alibaba Grp US Inc, Bellevue, WA 98004 USA
基金
中国国家自然科学基金;
关键词
action recognition; weighted rank pooling; weighted dynamic image; 3D convolutional LSTM network; canonical correlation analysis;
D O I
10.3390/s20113305
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
The paper presents a novel hybrid network for large-scale action recognition from multiple modalities. The network is built upon the proposed weighted dynamic images. It effectively leverages the strengths of the emerging Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) based approaches to specifically address the challenges that occur in large-scale action recognition and are not fully dealt with by the state-of-the-art methods. Specifically, the proposed hybrid network consists of a CNN based component and an RNN based component. Features extracted by the two components are fused through canonical correlation analysis and then fed to a linear Support Vector Machine (SVM) for classification. The proposed network achieved state-of-the-art results on the ChaLearn LAP IsoGD, NTU RGB+D and Multi-modal & Multi-view & Interactive ((MI)-I-2) datasets and outperformed existing methods by a large margin (over 10 percentage points in some cases).
引用
收藏
页码:1 / 25
页数:25
相关论文
共 50 条
  • [1] A Large-scale RGB-D Database for Arbitrary-view Human Action Recognition
    Ji, Yanli
    Xu, Feixiang
    Yang, Yang
    Shen, Fumin
    Shen, Heng Tao
    Zheng, Wei-Shi
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 1510 - 1518
  • [2] Large-Scale Human Action Recognition with Spark
    Wang, Hanli
    Zheng, Xiaobin
    Xiao, Bo
    2015 IEEE 17TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2015,
  • [3] A two-stream heterogeneous network for action recognition based on skeleton and RGB modalities
    Liu, Kai
    Gao, Lei
    Khan, Naimul Mefraz
    Qi, Lin
    Guan, Ling
    23RD IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2021), 2021, : 87 - 91
  • [4] Dynamic large-scale network synchronization from perception to action
    Hirvonen, Jonni
    Monto, Simo
    Wang, Sheng H.
    Palva, J. Matias
    Palva, Satu
    NETWORK NEUROSCIENCE, 2018, 2 (04): : 442 - 463
  • [5] Human Action Recognition Using a Distributed RGB-Depth Camera Network
    Liu, Guoliang
    Tian, Guohui
    Li, Junwei
    Zhu, Xianglai
    Wang, Ziren
    IEEE SENSORS JOURNAL, 2018, 18 (18) : 7570 - 7576
  • [6] HaarNet: Large-Scale Linear-Morphological Hybrid Network for RGB-D Semantic Segmentation
    Groenendijk, Rick
    Dorst, Leo
    Gever, Theo
    DISCRETE GEOMETRY AND MATHEMATICAL MORPHOLOGY, DGMM 2024, 2024, 14605 : 242 - 254
  • [7] A large-scale fMRI dataset for human action recognition
    Zhou, Ming
    Gong, Zhengxin
    Dai, Yuxuan
    Wen, Yushan
    Liu, Youyi
    Zhen, Zonglei
    SCIENTIFIC DATA, 2023, 10 (01)
  • [8] A large-scale fMRI dataset for human action recognition
    Ming Zhou
    Zhengxin Gong
    Yuxuan Dai
    Yushan Wen
    Youyi Liu
    Zonglei Zhen
    Scientific Data, 10
  • [9] A Dense-Sparse Complementary Network for Human Action Recognition based on RGB and Skeleton Modalities
    Cheng, Qin
    Cheng, Jun
    Liu, Zhen
    Ren, Ziliang
    Liu, Jianming
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 244
  • [10] Depth Pooling Based Large-Scale 3-D Action Recognition With Convolutional Neural Networks
    Wang, Pichao
    Li, Wanqing
    Gao, Zhimin
    Tang, Chang
    Ogunbona, Philip O.
    IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (05) : 1051 - 1061