Human behaviour recognition with mid-level representations for crowd understanding and analysis

被引：2

作者：

Sun, Bangyong ^{[1
,2
]}

Yuan, Nianzeng ^{[1
]}

Li, Shuying ^{[4
]}

Wu, Siyuan ^{[2
]}

Wang, Nan ^{[2
,3
]}

机构：

[1] Xian Univ Technol, Coll Printing Packaging Engn & Digital Media, Xian 710048, Shaanxi, Peoples R China

[2] Chinese Acad Sci, Xian Inst Opt & Precis Mech, Key Lab Spectral Imaging Technol CAS, Xian 710119, Shaanxi, Peoples R China

[3] Univ Chinese Acad Sci, 19A Yuquanlu, Beijing 100049, Peoples R China

[4] Xian Univ Posts & Telecommun, Sch Automat, Xian 710121, Shaanxi, Peoples R China

来源：

IET IMAGE PROCESSING | 2021年 / 15卷 / 14期

基金：

中国国家自然科学基金; 中国博士后科学基金;

关键词：

VIDEOS;

D O I：

10.1049/ipr2.12147

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Crowd understanding and analysis have received increasing attention for couples of decades, and development of human behaviour recognition strongly supports the application of crowd understanding and analysis. Human behaviour recognition usually seeks to automatically analyse ongoing movements and actions in different camera views by using various machine learning methodologies in unknown video clips or image sequences. Compared to other data modalities such as documents and images, processing video data demands much higher computational and storage resources. The idea of using middle level semantic concepts to represent human actions from videos is explored and it is argued that these semantic attributes enable the construction of more descriptive methods for human action recognition. The mid-level attributes, initialized by a cluster processing, are built upon low level features and fully utilize the discrepancies in different action classes, which can capture the importance of each attribute for each action class. In this way, the representation is constructed to be semantically rich and capable of highly discriminative performance even paired with simple linear classifiers. The method is verified on three challenging datasets (KTH, UCF50 and HMDB51), and the experimental results demonstrate that our method achieves better results than the baseline methods on human action recognition.

引用

页码：3414 / 3424

页数：11

共 50 条

[31] Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks
Oquab, Maxime
Bottou, Leon
Laptev, Ivan
Sivic, Josef
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 1717 - 1724
[32] Learning part-based mid-level representation for visual recognition
Yuan, Baodi
Tu, Jian
Zhao, Rui-Wei
Zheng, Yingbin
Jiang, Yu-Gang
NEUROCOMPUTING, 2018, 275 : 2126 - 2136
[33] A generic mid-level representation for semantic video analysis
Tang, Q
Lim, JH
Jin, JS
Sun, HP
Tian, Q
ICIP: 2004 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1- 5, 2004, : 629 - 632
[34] Mid-level deep Food Part mining for food image recognition
Zheng, Jiannan
Zou, Liang
Wang, Z. Jane
IET COMPUTER VISION, 2018, 12 (03) : 298 - 304
[35] Action Recognition by Mid-Level Discriminative Spatial-Temporal Volume
Chen, Feifei
Sang, Nong
MIPPR 2013: PATTERN RECOGNITION AND COMPUTER VISION, 2013, 8919
[36] Group Sparse-Based Mid-Level Representation for Action Recognition
Zhang, Shiwei
Gao, Changxin
Chen, Feifei
Luo, Sihui
Sang, Nong
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2017, 47 (04): : 660 - 672
[37] Mid-level features and spatio-temporal context for activity recognition
Yuan, Fei
Xia, Gui-Song
Sahbi, Hichem
Prinet, Veronique
PATTERN RECOGNITION, 2012, 45 (12) : 4182 - 4191
[38] Unsupervised Deep Learning of Mid-Level Video Representation for Action Recognition
Hou, Jingyi
Wu, Xinxiao
Chen, Jin
Luo, Jiebo
Jia, Yunde
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 6910 - 6917
[39] Combining low and mid-level gaze features for desktop activity recognition
2018, Association for Computing Machinery, 2 Penn Plaza, Suite 701, New York, NY 10121-0701, United States (02)
[40] Fine-Grained Action Recognition by Motion Saliency and Mid-Level Patches
Liu, Fang
Zhao, Liang
Cheng, Xiaochun
Dai, Qin
Shi, Xiangbin
Qiao, Jianzhong
APPLIED SCIENCES-BASEL, 2020, 10 (08):

← 1 2 3 4 5 →