Harnessing multimodal approaches for depression detection using large language models and facial expressions

被引：0

作者：

Misha Sadeghi ^{[1
]}

Robert Richer ^{[1
]}

Bernhard Egger ^{[2
]}

Lena Schindler-Gmelch ^{[3
]}

Lydia Helene Rupp ^{[3
]}

Farnaz Rahimi ^{[1
]}

Matthias Berking ^{[3
]}

Bjoern M. Eskofier ^{[1
]}

机构：

[1] Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU),Machine Learning and Data Analytics Lab (MaD Lab), Department Artificial Intelligence in Biomedical Engineering (AIBE)

[2] Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU),Chair of Visual Computing (LGDV), Department of Computer Science

[3] Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU),Chair of Clinical Psychology and Psychotherapy (KliPs)

[4] Institute of AI for Health,Translational Digital Health Group

[5] Helmholtz Zentrum München - German Research Center for Environmental Health,undefined

来源：

npj Mental Health Research | / 3卷 / 1期

关键词：

D O I：

10.1038/s44184-024-00112-8

中图分类号：

学科分类号：

摘要：

Detecting depression is a critical component of mental health diagnosis, and accurate assessment is essential for effective treatment. This study introduces a novel, fully automated approach to predicting depression severity using the E-DAIC dataset. We employ Large Language Models (LLMs) to extract depression-related indicators from interview transcripts, utilizing the Patient Health Questionnaire-8 (PHQ-8) score to train the prediction model. Additionally, facial data extracted from video frames is integrated with textual data to create a multimodal model for depression severity prediction. We evaluate three approaches: text-based features, facial features, and a combination of both. Our findings show the best results are achieved by enhancing text data with speech quality assessment, with a mean absolute error of 2.85 and root mean square error of 4.02. This study underscores the potential of automated depression detection, showing text-only models as robust and effective while paving the way for multimodal analysis.

引用

共 50 条

[1] Understanding Naturalistic Facial Expressions with Deep Learning and Multimodal Large Language Models
Bian, Yifan
Kuester, Dennis
Liu, Hui
Krumhuber, Eva G.
SENSORS, 2024, 24 (01)
[2] Contextual Object Detection with Multimodal Large Language Models
Zang, Yuhang
Li, Wei
Han, Jun
Zhou, Kaiyang
Loy, Chen Change
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (02) : 825 - 843
[3] Detection of facial expressions of emotions in depression
Suslow, T
Junghanns, K
Arolt, V
PERCEPTUAL AND MOTOR SKILLS, 2001, 92 (03) : 857 - 868
[4] Harnessing the Power of Large Language Models
Hofmann, Meike
Burch, Gerald F.
Burch, Jana J.
ISACA Journal, 2024, 1 : 32 - 39
[5] Harnessing multimodal large language models for traffic knowledge graph generation and decision-making
Kuang, Senyun
Liu, Yang
Wang, Xin
Wu, Xinhua
Wei, Yintao
COMMUNICATIONS IN TRANSPORTATION RESEARCH, 2024, 4
[6] Using Augmented Small Multimodal Models to Guide Large Language Models for Multimodal Relation Extraction
He, Wentao
Ma, Hanjie
Li, Shaohua
Dong, Hui
Zhang, Haixiang
Feng, Jie
APPLIED SCIENCES-BASEL, 2023, 13 (22):
[7] Towards Emotion Detection in Educational Scenarios from Facial Expressions and Body Movements through Multimodal Approaches
Saneiro, Mar
Santos, Olga C.
Salmeron-Majadas, Sergio
Boticario, Jesus G.
SCIENTIFIC WORLD JOURNAL, 2014,
[8] Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study
Tamberg, Karl
Bahsi, Hayretdin
IEEE ACCESS, 2025, 13 : 29698 - 29717
[9] InteraRec: Interactive Recommendations Using Multimodal Large Language Models
Karra, Saketh Reddy
Tulabandhula, Theja
TRENDS AND APPLICATIONS IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2024 WORKSHOPS, RAFDA AND IWTA, 2024, 14658 : 32 - 43
[10] Harnessing Large Language Models for Chart Review
Xu, Dongchu
Cunningham, Jonathan W.
JOURNAL OF THE AMERICAN HEART ASSOCIATION, 2025, 14 (07):

← 1 2 3 4 5 →