Action Recognition with Multi-stream Motion Modeling and Mutual Information Maximization

被引:0
|
作者
Yang, Yuheng [1 ]
Chen, Haipeng [1 ]
Liu, Zhenguang [2 ]
Lyu, Yingda [3 ]
Zhang, Beibei [5 ]
Wu, Shuang [4 ]
Wang, Zhibo [2 ]
Ren, Kui [2 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Jilin, Peoples R China
[2] Zhejiang Univ, Sch Cyber Sci & Technol, Hangzhou, Peoples R China
[3] Jilin Univ, Publ Comp Educ & Res Ctr, Jilin, Peoples R China
[4] Black Sesame Technol, Solaris, Singapore
[5] Zhejiang Lab, Hangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Action recognition has long been a fundamental and intriguing problem in artificial intelligence. The task is challenging due to the high dimensionality nature of an action, as well as the subtle motion details to be considered. Current state-of-the-art approaches typically learn from articulated motion sequences in the straightforward 3D Euclidean space. However, the vanilla Euclidean space is not efficient for modeling important motion characteristics such as the joint-wise angular acceleration, which reveals the driving force behind the motion. Moreover, current methods typically attend to each channel equally and lack theoretical constrains on extracting task-relevant features from the input. In this paper, we seek to tackle these challenges from three aspects: (1) We propose to incorporate an acceleration representation, explicitly modeling the higher-order variations in motion. (2) We introduce a novel Stream-GCN network equipped with multi-stream components and channel attention, where different representations (i.e., streams) supplement each other towards a more precise action recognition while attention capitalizes on those important channels. (3) We explore feature-level supervision for maximizing the extraction of task-relevant information and formulate this into a mutual information loss. Empirically, our approach sets the new state-of-the-art performance on three benchmark datasets, NTU RGB+D, NTU RGB+D 120, and NW-UCLA.
引用
收藏
页码:1658 / 1666
页数:9
相关论文
共 50 条
  • [41] Partially Occluded Skeleton Action Recognition Based on Multi-stream Fusion Graph Convolutional Networks
    Li, Dan
    Shi, Wuzhen
    ADVANCES IN COMPUTER GRAPHICS, CGI 2021, 2021, 13002 : 178 - 189
  • [42] Maximization of mutual information for offline Thai handwriting recognition
    Nopsuwanchai, Roongroj
    Biem, Alain
    Clocksin, William F.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2006, 28 (08) : 1347 - 1351
  • [43] A Multi-View Human Action recognition System in Limited Data case using multi-stream CNN
    Chenarlogh, Vahid Ashkani
    Razzazi, Farbod
    Mohammadyahya, Najmeh
    2019 5TH IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS 2019), 2019,
  • [44] Combining Information from Multi-Stream Features Using Deep Neural Network in Speech Recognition
    Zhou, Pan
    Dai, Lirong
    Liu, Qingfeng
    Jiang, Hui
    PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 557 - +
  • [45] An Optimized multi-stream decoding algorithm for handwritten word recognition
    Kessentini, Yousri
    Paquet, Thierry
    Guermazi, Ahmed
    11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 192 - 196
  • [46] Multi-stream HMM for EMG-based speech recognition
    Manabe, H
    Zhang, Z
    PROCEEDINGS OF THE 26TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-7, 2004, 26 : 4389 - 4392
  • [47] Multimodal Egocentric Activity Recognition Using Multi-stream CNN
    Imran, Javed
    Raman, Balasubramanian
    ELEVENTH INDIAN CONFERENCE ON COMPUTER VISION, GRAPHICS AND IMAGE PROCESSING (ICVGIP 2018), 2018,
  • [48] A Multi-Stream Sequence Learning Framework for Human Interaction Recognition
    Haroon, Umair
    Ullah, Amin
    Hussain, Tanveer
    Ullah, Waseem
    Sajjad, Muhammad
    Muhammad, Khan
    Lee, Mi Young
    Baik, Sung Wook
    IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2022, 52 (03) : 435 - 444
  • [49] Multi-stream Deep Networks for Vehicle Make and Model Recognition
    Besbes, Mohamed Dhia Elhak
    Kessentini, Yousri
    Tabia, Hedi
    PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 5: VISAPP, 2020, : 413 - 419
  • [50] Hierarchical multi-stream posterior based speech recognition system
    Ketabdar, H
    Bourlard, H
    Bengio, S
    MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2005, 3869 : 294 - 306