Learning Spatial and Temporal Cues for Multi-label Facial Action Unit Detection

被引:85
|
作者
Chu, Wen-Sheng [1 ]
De la Torre, Fernando [1 ]
Cohn, Jeffrey F. [1 ,2 ]
机构
[1] Carnegie Mellon Univ, Robot Inst, Pittsburgh, PA 15213 USA
[2] Univ Pittsburgh, Dept Psychol, Pittsburgh, PA 15260 USA
基金
美国国家卫生研究院;
关键词
EXPRESSIONS; EMOTION;
D O I
10.1109/FG.2017.13
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Facial action units (AU) are the fundamental units to decode human facial expressions. At least three aspects affect performance of automated AU detection: spatial representation, temporal modeling, and AU correlation. Unlike most studies that tackle these aspects separately, we propose a hybrid network architecture to jointly model them. Specifically, spatial representations are extracted by a Convolutional Neural Network (CNN), which, as analyzed in this paper, is able to reduce person-specific biases caused by hand-crafted descriptors (e.g., HOG and Gabor). To model temporal dependencies, Long Short-Term Memory (LSTMs) are stacked on top of these representations, regardless of the lengths of input videos. The outputs of CNNs and LSTMs are further aggregated into a fusion network to produce per-frame prediction of 12 AUs. Our network naturally addresses the three issues together, and yields superior performance compared to existing methods that consider these issues independently. Extensive experiments were conducted on two large spontaneous datasets, GFT and BP4D, with more than 400,000 frames coded with 12 AUs. On both datasets, we report improvements over a standard multi-label CNN and feature-based state-of-the-art. Finally, we provide visualization of the learned AU models, which, to our best knowledge, reveal how machines see AUs for the first time.
引用
收藏
页码:25 / 32
页数:8
相关论文
共 50 条
  • [41] Multi-Label Manifold Learning
    Hou, Peng
    Geng, Xin
    Zhang, Min-Ling
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 1680 - 1686
  • [42] Multi-label Crowdsourcing Learning
    Li S.-Y.
    Jiang Y.
    Ruan Jian Xue Bao/Journal of Software, 2020, 31 (05): : 1497 - 1510
  • [43] Fast Multi-label Learning
    Gong, Xiuwen
    Yuan, Dong
    Bao, Wei
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2432 - 2438
  • [44] Partial Multi-Label Learning
    Xie, Ming-Kun
    Huang, Sheng-Jun
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 4302 - 4309
  • [45] Joint multi-label learning and feature extraction for temporal link prediction
    Ma, Xiaoke
    Tan, Shiyin
    Xie, Xianghua
    Zhong, Xiaoxiong
    Deng, Jingjing
    PATTERN RECOGNITION, 2022, 121
  • [46] Integrating Semantic and Temporal Relationships in Facial Action Unit Detection
    Li, Zhihua
    Deng, Xiang
    Li, Xiaotian
    Yin, Lijun
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5519 - 5527
  • [47] Meta Auxiliary Learning for Facial Action Unit Detection
    Li, Yong
    Shan, Shiguang
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) : 2526 - 2538
  • [48] A transductive multi-label learning approach for video concept detection
    Wang, Jingdong
    Zhao, Yinghai
    Wu, Xiuqing
    Hua, Xian-Sheng
    PATTERN RECOGNITION, 2011, 44 (10-11) : 2274 - 2286
  • [49] A multi-label waste detection model based on transfer learning
    Zhang, Qiang
    Yang, Qifan
    Zhang, Xujuan
    Wei, Wei
    Bao, Qiang
    Su, Jinqi
    Liu, Xueyan
    RESOURCES CONSERVATION AND RECYCLING, 2022, 181
  • [50] Multi-label Learning for Detection of CME-Associated Phenomena
    Y. H. Yang
    H. M. Tian
    B. Peng
    T. R. Li
    Z. X. Xie
    Solar Physics, 2017, 292