Estimation of affective dimensions using CNN-based features of audiovisual data

被引：2

作者：

Basnet, Ramesh ^{[1
]}

Islam, Mohammad Tariqul ^{[2
]}

Howlader, Tamanna ^{[3
]}

Rahman, S. M. Mahbubur ^{[2
]}

Hatzinakos, Dimitrios ^{[4
]}

机构：

[1] Concordia Univ, Dept Elect & Comp Engn, Montreal, PQ H3G 1M8, Canada

[2] Bangladesh Univ Engn & Technol, Dept Elect & Elect Engn, Dhaka 1205, Bangladesh

[3] Univ Dhaka, Inst Stat Res & Training, Dhaka 1000, Bangladesh

[4] Univ Toronto, Dept Elect & Comp Engn, Toronto, ON M5S 2E4, Canada

来源：

PATTERN RECOGNITION LETTERS | 2019年 / 128卷

关键词：

Convolutional neural network; Affective features; Emotional dimensions; RECOGNITION;

D O I：

10.1016/j.patrec.2019.09.015

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Automatic estimation of emotional state has been of great interest as emotion is an important component in user-oriented interactive technologies. This paper investigates the usage of feed-forward convolutional neural network (CNN) and features extracted from such networks for predicting dimensions of continuous-level emotional states. In this context, a two-stream CNN architecture wherein the video and audio data are learned simultaneously, is proposed. End-to-end mapping of audiovisual data to emotional dimensions reveals that the two-stream network performs better than its single-stream counterpart. The representations learned by the CNNs are refined through a minimum redundancy maximum relevance statistical selection method. Then, the support vector regression applied to selected CNN-based features estimates the instantaneous values of emotional dimensions. The proposed method is trained and tested using the audiovisual conversations of well-known RECOLA and SEMAINE databases. Experimentally it is verified that the regression of the CNN-based features outperforms the traditional audiovisual affective features as well as the end-to-end CNN mapping. Through generalization experiments, it is also observed that the learned representations are robust enough to provide an acceptable prediction performance, when the settings of training and testing datasets are widely different. (C) 2019 Elsevier B.V. All rights reserved.

引用

页码：290 / 297

页数：8

共 50 条

[41] CNN-based Multiple Manipulation Detector Using Frequency Domain Features of Image Residuals
Singhal, Divya
Gupta, Abhinav
Tripathi, Anurag
Kothari, Ravi
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2020, 11 (04)
[42] ONLINE ENVIRONMENTAL ADAPTATION OF CNN-BASED ACOUSTIC MODELS USING SPATIAL DIFFUSENESS FEATURES
Huemmer, Christian
Delcroix, Marc
Ogawa, Atsunori
Kinoshita, Keisuke
Nakatani, Tomohiro
Kellermann, Walter
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4875 - 4879
[43] Comparative Study on CNN-based Bridge Seismic Damage Identification Using Various Features
Zhou, Xiaohang
Zhao, Yian
Khan, Inamullah
Cao, Lu
KSCE JOURNAL OF CIVIL ENGINEERING, 2024, 28 (12) : 5618 - 5627
[44] An Improved Multispectral Palmprint System Using Deep CNN-based Palm-Features
Trabelsi, Selma
Samai, Djamel
Meraoumia, Abdallah
Bensid, Khaled
Taleb-Ahmed, Abdelmalik
2019 INTERNATIONAL CONFERENCE ON ADVANCED ELECTRICAL ENGINEERING (ICAEE), 2019,
[45] Bi-directional CRC algorithm using CNN-based features for face classification
Wang, Yanan
Na, Tian
Song, Xiaoning
Hu, Guosheng
JOURNAL OF ENGINEERING-JOE, 2018, (16): : 1457 - 1462
[46] A CNN-based 3D human pose estimation based on projection of depth and ridge data
Kim, Yeonho
Kim, Daijin
PATTERN RECOGNITION, 2020, 106
[47] Pole Transformation of Magnetic Data Using CNN-Based Deep Learning Models
Jia, Zhuo
Huang, Meijia
Xu, Hong
Du, Wei
Li, Yabin
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
[48] Pose Estimation of Robot End-Effector using a CNN-Based Cascade Estimator
Ortega, Kevin D.
Sepulveda, Jorge I.
Hernandez, Byron
Holguin, German A.
Medeiros, Henry
2023 IEEE 6TH COLOMBIAN CONFERENCE ON AUTOMATIC CONTROL, CCAC, 2023, : 85 - 90
[49] A CNN-BASED FLOOD MAPPING APPROACH USING SENTINEL-1 DATA
Tavus, Beste
Can, Recep
Kocaman, Sultan
XXIV ISPRS CONGRESS: IMAGING TODAY, FORESEEING TOMORROW, COMMISSION III, 2022, 5-3 : 549 - 556
[50] CNN-Based Analysis of Crowd Structure using Automatically Annotated Training Data
Zitouni, M. Sami
Sluzek, Andrzej
Bhaskar, Harish
2019 16TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2019,

← 1 2 3 4 5 →