Fusion of Appearance and Motion Features for Daily Activity Recognition from Egocentric Perspective

被引:2
|
作者
Lye, Mohd Haris [1 ]
AlDahoul, Nouar [1 ,2 ]
Abdul Karim, Hezerul [1 ]
机构
[1] Multimedia Univ, Fac Engn, Cyberjaya 63100, Selangor, Malaysia
[2] NYU, Comp Sci, POB 1291888, Abu Dhabi, U Arab Emirates
关键词
activities of daily living; convolutional neural network; egocentric vision; feature fusion; optical flow; DESCRIPTORS;
D O I
10.3390/s23156804
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Vidos from a first-person or egocentric perspective offer a promising tool for recognizing various activities related to daily living. In the egocentric perspective, the video is obtained from a wearable camera, and this enables the capture of the person's activities in a consistent viewpoint. Recognition of activity using a wearable sensor is challenging due to various reasons, such as motion blur and large variations. The existing methods are based on extracting handcrafted features from video frames to represent the contents. These features are domain-dependent, where features that are suitable for a specific dataset may not be suitable for others. In this paper, we propose a novel solution to recognize daily living activities from a pre-segmented video clip. The pre-trained convolutional neural network (CNN) model VGG16 is used to extract visual features from sampled video frames and then aggregated by the proposed pooling scheme. The proposed solution combines appearance and motion features extracted from video frames and optical flow images, respectively. The methods of mean and max spatial pooling (MMSP) and max mean temporal pyramid (TPMM) pooling are proposed to compose the final video descriptor. The feature is applied to a linear support vector machine (SVM) to recognize the type of activities observed in the video clip. The evaluation of the proposed solution was performed on three public benchmark datasets. We performed studies to show the advantage of aggregating appearance and motion features for daily activity recognition. The results show that the proposed solution is promising for recognizing activities of daily living. Compared to several methods on three public datasets, the proposed MMSP-TPMM method produces higher classification performance in terms of accuracy (90.38% with LENA dataset, 75.37% with ADL dataset, 96.08% with FPPA dataset) and average per-class precision (AP) (58.42% with ADL dataset and 96.11% with FPPA dataset).
引用
收藏
页数:20
相关论文
共 50 条
  • [31] Lower Limb Motion Intention Recognition Based on sEMG Fusion Features
    Zhang, Peng
    Zhang, Junxia
    Elsabbagh, Ahmed
    IEEE SENSORS JOURNAL, 2022, 22 (07) : 7005 - 7014
  • [32] First-Person Animal Activity Recognition from Egocentric Videos
    Iwashita, Yumi
    Takamine, Asamichi
    Kurazume, Ryo
    Ryoo, M. S.
    2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 4310 - 4315
  • [33] Combining appearance and motion for face and gender recognition from videos
    Hadid, Abdenour
    Pietikainen, Matti
    PATTERN RECOGNITION, 2009, 42 (11) : 2818 - 2827
  • [34] Human Activity Recognition Via Motion and Vision Data Fusion
    Zhu, Chun
    Cheng, Qi
    Sheng, Weihua
    2010 CONFERENCE RECORD OF THE FORTY FOURTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS (ASILOMAR), 2010, : 332 - 336
  • [35] Shape and Motion Features Approach for Activity Tracking and Recognition from Kinect Video Camera
    Jalal, Ahmad
    Kamal, Shaharyar
    Kim, Daijin
    2015 IEEE 29TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS WORKSHOPS WAINA 2015, 2015, : 445 - 450
  • [36] Daily Living Activity Recognition with ECHONET Lite Appliances and Motion Sensors
    Moriya, Kazuki
    Nakagawa, Eri
    Fujimoto, Manato
    Suwa, Hirohiko
    Arakawa, Yutaka
    Kimura, Aki
    Miki, Satoko
    Yasumoto, Keiichi
    2017 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS (PERCOM WORKSHOPS), 2017,
  • [37] Efficient fall activity recognition by combining shape and motion features
    Abderrazak Iazzi
    Mohammed Rziza
    Rachid Oulad Haj Thami
    Computational Visual Media, 2020, 6 : 247 - 263
  • [38] Efficient fall activity recognition by combining shape and motion features
    Iazzi, Abderrazak
    Rziza, Mohammed
    Thami, Rachid Oulad Haj
    COMPUTATIONAL VISUAL MEDIA, 2020, 6 (03) : 247 - 263
  • [39] Efficient fall activity recognition by combining shape and motion features
    Abderrazak Iazzi
    Mohammed Rziza
    Rachid Oulad Haj Thami
    Computational Visual Media, 2020, 6 (03) : 247 - 263
  • [40] Batch-Based Activity Recognition from Egocentric Photo-Streams
    Cartas, Alejandro
    Dimiccoli, Mariella
    Radeva, Petia
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 2347 - 2354