연구 분야: Artificial Intelligence
학회: International Conference on Security and Privacy in New Computing Environments
In multi-lingual emotional speech synthesis, it is difficult to incorporate suitable emotional expressions in the synthesis process due to the differences between the emotional expressions of different linguals. In order to extract better emotional expressions of different linguals to assist the multi-lingual emotional speech synthesis, this paper conducts research on multi-lingual speech emotion recognition. In the current study of multi-lingual speech emotion recognition (SER), the combining method (TCM) and multi-task method (TMM) are the popular methods. However, good performance can’t be obtained, the reason is that TCM doesn’t consider the emotional difference of different linguals and it is not easy to train the good emotion recognition model and good language recognition model at the same time for TMM. In order to settle the issue, a two-stage multi-lingual SER method is proposed in this paper, wherein language recognition is to recognize the language type at the first stage, and then emotion recognition is applied at the second stage. In addition, wav2vec 2.0 is used as the input while ResNet18 is selected as the model for language recognition and emotion recognition respectively. The experimental results show that the proposed method can work on multi-lingual SER, meanwhile, the proposed method performs better than TCM and TMM.
| 발행 연도 | 2025년 |
|---|---|
| 인용수 | 0 |
| 출판 국가 | Andorra, China |
| 사이트 | Springer |
| 좋아요 수 | 0 |