Learning facial action units with spatiotemporal cues and multi-label sampling

被引:12
|
作者
Chu, Wen-Sheng [1 ]
De la Torre, Fernando [1 ]
Cohn, Jeffrey F. [1 ,2 ]
机构
[1] Carnegie Mellon Univ, Inst Robot, Pittsburgh, PA 15213 USA
[2] Univ Pittsburgh, Dept Psychol, Pittsburgh, PA 15260 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
Multi-label learning; Deep learning; Spatio-temporal learning; Multi-label sampling; Facial action unit detection; Video analysis; EXPRESSION; RECOGNITION; EMOTION;
D O I
10.1016/j.imavis.2018.10.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Facial action units (AUs) can be represented spatially, temporally, and in terms of their correlation. Previous research focuses on one or another of these aspects or addresses them disjointly. We propose a hybrid network architecture that jointly models spatial and temporal representations and their correlation. In particular, we use a Convolutional Neural Network (CNN) to learn spatial representations, and a Long Short-Term Memory (LSTM) to model temporal dependencies among them. The outputs of CNNs and LSTMs are aggregated into a fusion network to produce per-frame prediction of multiple AUs. The hybrid network was compared to previous state-of-the-art approaches in two large FACS-coded video databases, GFT and BP4D, with over 400,000 AU-coded frames of spontaneous facial behavior in varied social contexts. Relative to standard multi-label CNN and feature-based state-of-the-art approaches, the hybrid system reduced person-specific biases and obtained increased accuracy for AU detection. To address class imbalance within and between batches during network training, we introduce multi-labeling sampling strategies that further increase accuracy when AUs are relatively sparse. Finally, we provide visualization of the learned AU models, which, to the best of our best knowledge, reveal for the first time how machines see AUs. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:1 / 14
页数:14
相关论文
共 50 条
  • [1] Learning Spatial and Temporal Cues for Multi-label Facial Action Unit Detection
    Chu, Wen-Sheng
    De la Torre, Fernando
    Cohn, Jeffrey F.
    2017 12TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2017), 2017, : 25 - 32
  • [2] Deep Region and Multi-label Learning for Facial Action Unit Detection
    Zhao, Kaili
    Chu, Wen-Sheng
    Zhang, Honggang
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3391 - 3399
  • [3] Joint Patch and Multi-label Learning for Facial Action Unit Detection
    Zhao, Kaili
    Chu, Wen-Sheng
    De la Torre, Fernando
    Cohn, Jeffrey F.
    Zhang, Honggang
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 2207 - 2216
  • [4] Multi-label learning with missing labels for image annotation and facial action unit recognition
    Wu, Baoyuan
    Lyu, Siwei
    Hu, Bao-Gang
    Ji, Qiang
    PATTERN RECOGNITION, 2015, 48 (07) : 2279 - 2289
  • [5] Joint Patch and Multi-label Learning for Facial Action Unit and Holistic Expression Recognition
    Zhao, Kaili
    Chu, Wen-Sheng
    De la Torre, Fernando
    Cohn, Jeffrey F.
    Zhang, Honggang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (08) : 3931 - 3946
  • [6] Multi-label learning with prior knowledge for facial expression analysis
    Zhao, Kaili
    Zhang, Honggang
    Ma, Zhanyu
    Song, Yi-Zhe
    Guo, Jun
    NEUROCOMPUTING, 2015, 157 : 280 - 289
  • [7] Facial Action Unit Detection with Multilayer Fused Multi-Task and Multi-Label Deep Learning Network
    He, Jun
    Li, Dongliang
    Bo, Sun
    Yu, Lejun
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2019, 13 (11) : 5546 - 5559
  • [8] Multi-label learning for fault diagnosis of pumping units with one positive label
    Qian, Kun
    Tang, Jinyu
    Zhao, Qimei
    Zhao, Shu
    Min, Fan
    APPLIED SOFT COMPUTING, 2025, 174
  • [9] Discriminant Multi-Label Manifold Embedding for Facial Action Unit Detection
    Yuce, Anil
    Gao, Hua
    Thiran, Jean-Philippe
    2015 11TH IEEE INTERNATIONAL CONFERENCE AND WORKSHOPS ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG), VOL. 6, 2015,
  • [10] Facial action unit recognition under incomplete data based on multi-label learning with missing labels
    Li, Yongqiang
    Wu, Baoyuan
    Ghanem, Bernard
    Zhao, Yongping
    Yao, Hongxun
    Ji, Qiang
    PATTERN RECOGNITION, 2016, 60 : 890 - 900