Depression detection using cascaded attention based deep learning framework using speech data

被引：0

作者：

Gupta, Sachi ^{[1
]}

Agarwal, Gaurav ^{[2
]}

Agarwal, Shivani ^{[3
]}

Pandey, Dilkeshwar ^{[4
]}

机构：

[1] Galgotias Coll Engn & Technol, Dept Comp Sci & Engn, Greater Noida 201310, Uttar Pradesh, India

[2] Galgotias Univ, Sch Comp Sci & Engn, Gr Noida 203201, Uttar Pradesh, India

[3] Ajay Kumar Garg Engn Coll, Dept Informat Technol, Ghaziabad 201009, Uttar Pradesh, India

[4] KIET Grp Inst, Dept Comp Sci & Engn, Ghaziabad 201206, Uttar Pradesh, India

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2024年 / 83卷 / 25期

关键词：

Speech signals; Multi-stage Discrete Wavelet Transform; Auction Optimization; Deep convolutional Attention; Depression; And Non-depression;

D O I：

10.1007/s11042-023-18076-w

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Efficient detection of depression is a challenging scenario in the field of speech signal processing. Since the speech signals provide a better diagnosis of depression, a significant methodology is required for detection. However, manual examination performed by radiologists can be time-consuming and may not be feasible in complex circumstances. Diverse detection methodologies have been proposed previously, but they are found to be less accurate, time-consuming and lead over maximized error rates. The proposed research article presents an effective and automatic deep learning-based depression detection using speech signal data. The steps involved in depression prediction are data acquisition, pre-processing, Feature Extraction, Feature selection and classification. The initial step in depression detection is data acquisition, which aims at collecting speech signals from the Distress Analysis Interview Corpus (DAIC-WOZ) and Sonde Health-free speech (SH2-FS) datasets. The collected data are pre-processed through MS_DWT (Multi-stage Discrete Wavelet Transform) to offer noise-free signals and improved signal quality. The relevant features required for processing the speech signal are extracted through Hilbert Huang (H-H) transform linear prediction cepstrum coefficient (LPCC), fundamental frequency, formants, speaking rate and Mel frequency cepstral coefficients (MFCC). From the extracted features, ideal features required for enhancing the detection accuracy are selected using the Price Auction optimization algorithm (PAOA). Finally, the depression and non-depression states are classified using deep convolutional Attention Cascaded two directional long short-term memory (DAttn_Conv 2D LSTM) with a softmax classifier. The overall accuracy obtained in classifying the depressed and non-depressed classes is 97.82% and 98.91%, respectively.

引用

页码：66135 / 66173

页数：39

共 50 条

[31] Detection of hate speech in Arabic tweets using deep learning
Areej Al-Hassan
Hmood Al-Dossari
Multimedia Systems, 2022, 28 : 1963 - 1974
[32] A Deep Learning Framework for the Detection of Malay Hate Speech
Maity, Krishanu
Bhattacharya, Shaubhik
Saha, Sriparna
Seera, Manjeevan
IEEE ACCESS, 2023, 11 : 79542 - 79552
[33] Improving Deep Learning-based Saliency Detection Using Channel Attention Module
Farsi, H.
Ghermezi, D.
Barati, A.
Mohamadzadeh, S.
International Journal of Engineering, Transactions B: Applications, 2024, 37 (11): : 2367 - 2379
[34] A Framework for Hate Speech Detection Using Deep Convolutional Neural Network
Roy, Pradeep Kumar
Tripathy, Asis Kumar
Das, Tapan Kumar
Gao, Xiao-Zhi
IEEE ACCESS, 2020, 8 : 204951 - 204962
[35] Improving Deep Learning-based Saliency Detection Using Channel Attention Module
Farsi, H.
Ghermezi, D.
Barati, A.
Mohamadzadeh, S.
INTERNATIONAL JOURNAL OF ENGINEERING, 2024, 37 (11): : 2367 - 2379
[36] DeepFocus: A visual focus of attention detection framework using deep learning in multi-object scenarios
Afroze, Sadia
Hossain, Md. Rajib
Hoque, Mohammed Moshiul
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (10) : 10109 - 10124
[37] A comprehensive framework for multi-modal hate speech detection in social media using deep learning
R. Prabhu
V. Seethalakshmi
Scientific Reports, 15 (1)
[38] Evaluation of phone posterior probabilities for pathology detection in speech data using deep learning models
Sahar Farazi
Yasser Shekofteh
International Journal of Speech Technology, 2025, 28 (1) : 99 - 116
[39] Face Detection Using Bionic Cascaded Framework
Li, Jin
Chen, Ziyue
Ouyang, Shunxin
Xie, Jingyu
Hu, Yue
Lv, Hui
COGNITIVE COMPUTING - ICCC 2019, 2019, 11518 : 79 - 90
[40] Early depression detection using ensemble machine learning framework
Khan I.
Gupta R.
International Journal of Information Technology, 2024, 16 (6) : 3791 - 3798

← 1 2 3 4 5 →