Abstract
This study investigates the utility of speech signals for AI-based depression screening in various interaction scenarios, including psychiatric interviews, chatbot conversations, and text readings. Participants include depressed patients recruited from the outpatient clinics of Peking University Sixth Hospital and control group members from the community, all diagnosed by psychiatrists following standardized diagnostic protocols. We extracted acoustic and deep speech features from each participant’s segmented recordings. Classifications were made using Multi-Layer Perceptron models, with aggregated clip outcomes determining final assessments. Our analysis of interaction scenarios, speech processing techniques, and feature types confirms speech as a crucial marker for depression screening. Specifically, human-computer interaction matches the efficacy of the clinical interview, surpassing reading tasks. The duration and quantity of segments significantly affect model performance, with deep speech features substantially outperforming traditional acoustic features.
Original language | English |
---|---|
Publication status | Accepted/In press - 2025 |
Event | IEEE International Conference on Acoustics, Speech, and Signal Processing - Duration: 6 Apr 2025 → 11 Apr 2025 |
Conference
Conference | IEEE International Conference on Acoustics, Speech, and Signal Processing |
---|---|
Abbreviated title | ICASSP |
Period | 6/04/25 → 11/04/25 |