Human behaviour recognition with mid-level representations for crowd understanding and analysis

被引:2
|
作者
Sun, Bangyong [1 ,2 ]
Yuan, Nianzeng [1 ]
Li, Shuying [4 ]
Wu, Siyuan [2 ]
Wang, Nan [2 ,3 ]
机构
[1] Xian Univ Technol, Coll Printing Packaging Engn & Digital Media, Xian 710048, Shaanxi, Peoples R China
[2] Chinese Acad Sci, Xian Inst Opt & Precis Mech, Key Lab Spectral Imaging Technol CAS, Xian 710119, Shaanxi, Peoples R China
[3] Univ Chinese Acad Sci, 19A Yuquanlu, Beijing 100049, Peoples R China
[4] Xian Univ Posts & Telecommun, Sch Automat, Xian 710121, Shaanxi, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
VIDEOS;
D O I
10.1049/ipr2.12147
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Crowd understanding and analysis have received increasing attention for couples of decades, and development of human behaviour recognition strongly supports the application of crowd understanding and analysis. Human behaviour recognition usually seeks to automatically analyse ongoing movements and actions in different camera views by using various machine learning methodologies in unknown video clips or image sequences. Compared to other data modalities such as documents and images, processing video data demands much higher computational and storage resources. The idea of using middle level semantic concepts to represent human actions from videos is explored and it is argued that these semantic attributes enable the construction of more descriptive methods for human action recognition. The mid-level attributes, initialized by a cluster processing, are built upon low level features and fully utilize the discrepancies in different action classes, which can capture the importance of each attribute for each action class. In this way, the representation is constructed to be semantically rich and capable of highly discriminative performance even paired with simple linear classifiers. The method is verified on three challenging datasets (KTH, UCF50 and HMDB51), and the experimental results demonstrate that our method achieves better results than the baseline methods on human action recognition.
引用
收藏
页码:3414 / 3424
页数:11
相关论文
共 50 条
  • [21] Learning a Mid-Level Representation for Multiview Action Recognition
    Liu, Cuiwei
    Li, Zhaokui
    Shi, Xiangbin
    Du, Chong
    ADVANCES IN MULTIMEDIA, 2018, 2018
  • [22] AttriNet: Learning Mid-Level Features for Human Activity Recognition with Deep Belief Networks
    Nair, Harideep
    Tan, Cathy
    Zeng, Ming
    Mengshoel, Ole J.
    Shen, John Paul
    UBICOMP/ISWC'19 ADJUNCT: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING AND PROCEEDINGS OF THE 2019 ACM INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTERS, 2019, : 510 - 517
  • [23] LEVERAGING MID-LEVEL DEEP REPRESENTATIONS FOR PREDICTING FACE ATTRIBUTES IN THE WILD
    Zhong, Yang
    Sullivan, Josephine
    Li, Haibo
    2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 3239 - 3243
  • [24] BRIEF-Based Mid-Level Representations for Time Series Classification
    Souza, Renato
    Almeida, Raquel
    Miranda, Roberto
    do Patrocinio, Zenilton Kleber G., Jr.
    Malinowski, Simon
    Guimaraes, Silvio Jamil F.
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS (CIARP 2019), 2019, 11896 : 449 - 457
  • [25] Discovering Discriminative Action Parts from Mid-Level Video Representations
    Raptis, Michalis
    Kokkinos, Iasonas
    Soatto, Stefano
    2012 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2012, : 1242 - 1249
  • [26] Context for Object Detection via Lightweight Global and Mid-level Representations
    Unal, Mesut Erhan
    Kovashka, Adriana
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 8423 - 8430
  • [27] Mid-Level Parts Mined By Feature Selection For Action Recognition
    Zhang, ShiWei
    Sang, Nong
    Gao, ChangXin
    Chen, FeiFei
    Hu, Jing
    PROCEEDINGS 3RD IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION ACPR 2015, 2015, : 619 - 623
  • [28] Detecting Violent Content in Hollywood Movies by Mid-level Audio Representations
    Acar, Esra
    Hopfgartner, Frank
    Albayrak, Sahin
    2013 11TH INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING (CBMI 2013), 2013, : 73 - 78
  • [29] SuperPixel based mid-level image description for image recognition
    Tasli, H. Emrah
    Sicre, Ronan
    Gevers, Theo
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2015, 33 : 301 - 308
  • [30] Shape Recognition by Combining Contour and Skeleton into a Mid-Level Representation
    Shen, Wei
    Wang, Xinggang
    Yao, Cong
    Bai, Xiang
    PATTERN RECOGNITION (CCPR 2014), PT I, 2014, 483 : 391 - 400