Harnessing multimodal approaches for depression detection using large language models and facial expressions

被引:0
|
作者
Misha Sadeghi [1 ]
Robert Richer [1 ]
Bernhard Egger [2 ]
Lena Schindler-Gmelch [3 ]
Lydia Helene Rupp [3 ]
Farnaz Rahimi [1 ]
Matthias Berking [3 ]
Bjoern M. Eskofier [1 ]
机构
[1] Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU),Machine Learning and Data Analytics Lab (MaD Lab), Department Artificial Intelligence in Biomedical Engineering (AIBE)
[2] Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU),Chair of Visual Computing (LGDV), Department of Computer Science
[3] Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU),Chair of Clinical Psychology and Psychotherapy (KliPs)
[4] Institute of AI for Health,Translational Digital Health Group
[5] Helmholtz Zentrum München - German Research Center for Environmental Health,undefined
来源
关键词
D O I
10.1038/s44184-024-00112-8
中图分类号
学科分类号
摘要
Detecting depression is a critical component of mental health diagnosis, and accurate assessment is essential for effective treatment. This study introduces a novel, fully automated approach to predicting depression severity using the E-DAIC dataset. We employ Large Language Models (LLMs) to extract depression-related indicators from interview transcripts, utilizing the Patient Health Questionnaire-8 (PHQ-8) score to train the prediction model. Additionally, facial data extracted from video frames is integrated with textual data to create a multimodal model for depression severity prediction. We evaluate three approaches: text-based features, facial features, and a combination of both. Our findings show the best results are achieved by enhancing text data with speech quality assessment, with a mean absolute error of 2.85 and root mean square error of 4.02. This study underscores the potential of automated depression detection, showing text-only models as robust and effective while paving the way for multimodal analysis.
引用
收藏
相关论文
共 50 条
  • [41] Instruction Tuning Large Language Models for Multimodal Relation Extraction Using LoRA
    Li, Zou
    Pang, Ning
    Zhao, Xiang
    WEB INFORMATION SYSTEMS AND APPLICATIONS, WISA 2024, 2024, 14883 : 364 - 376
  • [42] Multimodal driver emotion recognition using motor activity and facial expressions
    Espino-Salinas, Carlos H.
    Luna-Garcia, Huizilopoztli
    Celaya-Padilla, Jose M.
    Barria-Huidobro, Cristian
    Gamboa Rosales, Nadia Karina
    Rondon, David
    Villalba-Condori, Klinge Orlando
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2024, 7
  • [43] Harnessing the Power of Large Language Models in Agricultural Safety & Health
    Shutske, John M.
    JOURNAL OF AGRICULTURAL SAFETY AND HEALTH, 2023, 29 (04): : 205 - 224
  • [44] Omega - harnessing the power of large language models for bioimage analysis
    Royer, Loic A.
    NATURE METHODS, 2024, 21 (08) : 1371 - 1373
  • [45] Clinical decision support for bipolar depression using large language models
    Perlis, Roy H.
    Goldberg, Joseph F.
    Ostacher, Michael J.
    Schneck, Christopher D.
    NEUROPSYCHOPHARMACOLOGY, 2024, 49 (09) : 1412 - 1416
  • [46] The Detection of Depression Using Multimodal Models Based on Text and Voice Quality Features
    Solieman, Hanadi
    Pustozerov, Evgenii A.
    PROCEEDINGS OF THE 2021 IEEE CONFERENCE OF RUSSIAN YOUNG RESEARCHERS IN ELECTRICAL AND ELECTRONIC ENGINEERING (ELCONRUS), 2021, : 1843 - 1848
  • [47] TSFFM: Depression detection based on latent association of facial and body expressions
    Li, Xingyun
    Yi, Xinyu
    Lu, Lin
    Wang, Hao
    Zheng, Yunshao
    Han, Mengmeng
    Wang, Qingxiang
    COMPUTERS IN BIOLOGY AND MEDICINE, 2024, 168
  • [48] Can We Edit Multimodal Large Language Models?
    Cheng, Siyuan
    Tian, Bozhong
    Liu, Qingbin
    Chen, Xi
    Wang, Yongheng
    Chen, Huajun
    Zhang, Ningyu
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 13877 - 13888
  • [49] Potato disease detection and prevention using multimodal AI and large language model
    Zhu, Hongfei
    Shi, Weiming
    Guo, Xinyu
    Lyu, Shiting
    Yang, Ranbing
    Han, Zhongzhi
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2025, 229
  • [50] Investigating the Catastrophic Forgetting in Multimodal Large Language Models
    Zhai, Yuexiang
    Tong, Shengbang
    Li, Xiao
    Cai, Mu
    Qu, Qing
    Lee, Yong Jae
    Ma, Yi
    CONFERENCE ON PARSIMONY AND LEARNING, VOL 234, 2024, 234 : 202 - 227