Learning Spatial and Temporal Cues for Multi-label Facial Action Unit Detection

被引：85

作者：

Chu, Wen-Sheng ^{[1
]}

De la Torre, Fernando ^{[1
]}

Cohn, Jeffrey F. ^{[1
,2
]}

机构：

[1] Carnegie Mellon Univ, Robot Inst, Pittsburgh, PA 15213 USA

[2] Univ Pittsburgh, Dept Psychol, Pittsburgh, PA 15260 USA

来源：

2017 12TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2017) | 2017年

基金：

美国国家卫生研究院;

关键词：

EXPRESSIONS; EMOTION;

D O I：

10.1109/FG.2017.13

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Facial action units (AU) are the fundamental units to decode human facial expressions. At least three aspects affect performance of automated AU detection: spatial representation, temporal modeling, and AU correlation. Unlike most studies that tackle these aspects separately, we propose a hybrid network architecture to jointly model them. Specifically, spatial representations are extracted by a Convolutional Neural Network (CNN), which, as analyzed in this paper, is able to reduce person-specific biases caused by hand-crafted descriptors (e.g., HOG and Gabor). To model temporal dependencies, Long Short-Term Memory (LSTMs) are stacked on top of these representations, regardless of the lengths of input videos. The outputs of CNNs and LSTMs are further aggregated into a fusion network to produce per-frame prediction of 12 AUs. Our network naturally addresses the three issues together, and yields superior performance compared to existing methods that consider these issues independently. Extensive experiments were conducted on two large spontaneous datasets, GFT and BP4D, with more than 400,000 frames coded with 12 AUs. On both datasets, we report improvements over a standard multi-label CNN and feature-based state-of-the-art. Finally, we provide visualization of the learned AU models, which, to our best knowledge, reveal how machines see AUs for the first time.

引用

页码：25 / 32

页数：8

共 50 条

[41] Multi-Label Manifold Learning
Hou, Peng
Geng, Xin
Zhang, Min-Ling
THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 1680 - 1686
[42] Multi-label Crowdsourcing Learning
Li S.-Y.
Jiang Y.
Ruan Jian Xue Bao/Journal of Software, 2020, 31 (05): : 1497 - 1510
[43] Fast Multi-label Learning
Gong, Xiuwen
Yuan, Dong
Bao, Wei
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2432 - 2438
[44] Partial Multi-Label Learning
Xie, Ming-Kun
Huang, Sheng-Jun
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 4302 - 4309
[45] Joint multi-label learning and feature extraction for temporal link prediction
Ma, Xiaoke
Tan, Shiyin
Xie, Xianghua
Zhong, Xiaoxiong
Deng, Jingjing
PATTERN RECOGNITION, 2022, 121
[46] Integrating Semantic and Temporal Relationships in Facial Action Unit Detection
Li, Zhihua
Deng, Xiang
Li, Xiaotian
Yin, Lijun
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5519 - 5527
[47] Meta Auxiliary Learning for Facial Action Unit Detection
Li, Yong
Shan, Shiguang
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) : 2526 - 2538
[48] A transductive multi-label learning approach for video concept detection
Wang, Jingdong
Zhao, Yinghai
Wu, Xiuqing
Hua, Xian-Sheng
PATTERN RECOGNITION, 2011, 44 (10-11) : 2274 - 2286
[49] A multi-label waste detection model based on transfer learning
Zhang, Qiang
Yang, Qifan
Zhang, Xujuan
Wei, Wei
Bao, Qiang
Su, Jinqi
Liu, Xueyan
RESOURCES CONSERVATION AND RECYCLING, 2022, 181
[50] Multi-label Learning for Detection of CME-Associated Phenomena
Y. H. Yang
H. M. Tian
B. Peng
T. R. Li
Z. X. Xie
Solar Physics, 2017, 292

← 1 2 3 4 5 →