연구 분야: Artificial Intelligence
학회: 2024 International Conference on Artificial Intelligence and Power Systems (AIPS)
The current speech recognition technology only focuses on the language information in the speech, ignoring the emotional information, which will affect the user's accurate understanding of the original speech. Therefore, it is necessary to realize the speech recognition with emotional description, and improve the user's experience of semantic recognition products. Firstly, emojis, which are widely used in computer communication, are selected as emotion labels, which are attached to the text of speech recognition as task output. According to the meaning and characteristics of emojis, discrete emotion classification and continuous emotion score were used to convert sample emotion labels into emojis, and Speech Recognition and emoji Prediction (SReP) dataset was proposed. Secondly, the end-to-end recognition model is constructed, and emoji recommendation is taken as a round in the speech recognition autoregressive process. The mixed recognition method of characters and emojis, the speech-text fusion module and the smooth regularization of new labels are designed, and the tasks are realized by using Hubert-based feature extractor and Conformer module. Experimental results on SReP dataset demonstrate the effectiveness of the proposed method.
| 발행 연도 | 2024년 |
|---|---|
| 인용수 | 109 |
| 출판 국가 | China |
| 사이트 | IEEE |
| 좋아요 수 | 0 |