Enhancing speech emotion recognition: a deep learning approach with self-attention and acoustic features


연구 분야: Artificial Intelligence



학회: The Journal of Supercomputing


초록

Speech emotion recognition (SER), which involves detecting and classifying emotions from speech signals, plays a crucial role in human–computer interaction. However, challenges such as variability in emotional expression and limited labeled data have hindered progress in this area. To address these issues, we propose a novel deep learning framework that combines multiple acoustic features, including MFCCs, Mel-spectrograms, and temporal-frequency domain features. Our model leverages three parallel CNN-LSTM branches for sequential feature extraction, followed by a self-attention mechanism to integrate the extracted representations. A final LSTM layer, along with dense layers, refines the classification process. This innovative fusion of features and attention mechanisms significantly enhances emotion recognition performance. Experimental evaluations demonstrate the effectiveness of our approach in improving classification accuracy.


Author Profile
Khadijeh Aghajani

Department of Computer Engineering University of Mazandaran Babolsar Iran

Iran
Author Profile
Mahbanou Zohrevandi

Department of Computer Engineering Malayer University Malayer Iran

Iran

📄 논문 정보

발행 연도 2025년
인용수 0
출판 국가 Iran
사이트 Springer
좋아요 수 0

연관 논문 목록 (38건)