연구 분야: Strategies
학회: Multimedia Systems
Convolutional neural networks (CNNs) are essentially applicable for feature extraction of images due to their robust mapping ability. However, due to the locality of CNNs, most of the existing methods that utilize CNNs for facial expression recognition (FER) cannot be well-suited for face images with occlusions or pose variations in the wild. Although our previous attention-modulated contextual spatial information network (ACSI-Net) is effective for FER, the loss function in ACSI-Net is not able to sufficiently constrain the network for the perception of discriminative facial features. In this paper, a novel constraining function called joint loss, tailored for FER in the wild, is offered to supervise ACSI-Net learning. In the Taylor expansion of the cross-entropy loss, the relative weights of the polynomial basis are adjusted to increase the learning intensity for categories with fewer samples, which produces our equilibrium loss. In addition, the regularization term is defined as the Euclidean distance between features and their class centers, with attention masking, which is named sparse center loss. Benefiting from the combination of equilibrium loss and sparse center loss, the joint loss can explore features with inter-class separateness and intra-class compactness from unbalanced data, leading to improved recognition performance. Experiments demonstrate that the presented joint loss, constraining ACSI-Net, achieves superior performance to ACSI-Net and is competitive with relevant advanced methods on two benchmark datasets, RAF_DB and AffectNet.
| 발행 연도 | 2025년 |
|---|---|
| 인용수 | 0 |
| 출판 국가 | Andorra |
| 사이트 | Springer |
| 좋아요 수 | 0 |