Learning Spatial and Temporal Cues for Multi-label Facial Action Unit Detection

被引:85
|
作者
Chu, Wen-Sheng [1 ]
De la Torre, Fernando [1 ]
Cohn, Jeffrey F. [1 ,2 ]
机构
[1] Carnegie Mellon Univ, Robot Inst, Pittsburgh, PA 15213 USA
[2] Univ Pittsburgh, Dept Psychol, Pittsburgh, PA 15260 USA
基金
美国国家卫生研究院;
关键词
EXPRESSIONS; EMOTION;
D O I
10.1109/FG.2017.13
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Facial action units (AU) are the fundamental units to decode human facial expressions. At least three aspects affect performance of automated AU detection: spatial representation, temporal modeling, and AU correlation. Unlike most studies that tackle these aspects separately, we propose a hybrid network architecture to jointly model them. Specifically, spatial representations are extracted by a Convolutional Neural Network (CNN), which, as analyzed in this paper, is able to reduce person-specific biases caused by hand-crafted descriptors (e.g., HOG and Gabor). To model temporal dependencies, Long Short-Term Memory (LSTMs) are stacked on top of these representations, regardless of the lengths of input videos. The outputs of CNNs and LSTMs are further aggregated into a fusion network to produce per-frame prediction of 12 AUs. Our network naturally addresses the three issues together, and yields superior performance compared to existing methods that consider these issues independently. Extensive experiments were conducted on two large spontaneous datasets, GFT and BP4D, with more than 400,000 frames coded with 12 AUs. On both datasets, we report improvements over a standard multi-label CNN and feature-based state-of-the-art. Finally, we provide visualization of the learned AU models, which, to our best knowledge, reveal how machines see AUs for the first time.
引用
收藏
页码:25 / 32
页数:8
相关论文
共 50 条
  • [31] Region attention and label embedding for facial action unit detection
    Song, Wei
    Li, Dong
    MULTIMEDIA SYSTEMS, 2025, 31 (02)
  • [32] TAN: Temporal Aggregation Network for Dense Multi-label Action Recognition
    Dai, Xiyang
    Singh, Bharat
    Ng, Joe Yue-Hei
    Davis, Larry S.
    2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 151 - 160
  • [33] Heterogeneous spatio-temporal relation learning network for facial action unit detection
    Song, Wenyu
    Shi, Shuze
    Dong, Yu
    An, Gaoyun
    PATTERN RECOGNITION LETTERS, 2022, 164 : 268 - 275
  • [34] Action unit detection by exploiting spatial-temporal and label-wise attention with transformer
    Wang, Lingfeng
    Qi, Jin
    Cheng, Jian
    Suzuki, Kenji
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 2469 - 2474
  • [35] Guided Interpretable Facial Expression Recognition via Spatial Action Unit Cues
    Belharbi, Soufiane
    Pedersoli, Marco
    Koerich, Alessandro Lameiras
    Bacon, Simon
    Granger, Eric
    2024 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, FG 2024, 2024,
  • [36] Compact Multi-Label Learning
    Shen, Xiaobo
    Liu, Weiwei
    Tsang, Ivor W.
    Sun, Quan-Sen
    Ong, Yew-Soon
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 4066 - 4073
  • [37] Multi-label Ensemble Learning
    Shi, Chuan
    Kong, Xiangnan
    Yu, Philip S.
    Wang, Bai
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT III, 2011, 6913 : 223 - 239
  • [38] Privileged Multi-label Learning
    You, Shan
    Xu, Chang
    Wang, Yunhe
    Xu, Chao
    Tao, Dacheng
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3336 - 3342
  • [39] Copula Multi-label Learning
    Liu, Weiwei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [40] On the consistency of multi-label learning
    Gao, Wei
    Zhou, Zhi-Hua
    ARTIFICIAL INTELLIGENCE, 2013, 199 : 22 - 44