A reweighting method for speech recognition with imbalanced data of Mandarin and sub-dialects


연구 분야: Artificial Intelligence



학회: Service Oriented Computing and Applications


초록

Automatic speech recognition (ASR) is an important technology in many fields like video-sharing services, online education and live broadcast. Most recent ASR methods are based on deep learning technology. A dataset containing training samples of standard Mandarin and its sub-dialects can be used to train a neural network-based ASR model that can recognize standard Mandarin and its sub-dialects. Usually, due to different costs of collecting different sub-dialects, the number of training samples of standard Mandarin in the dataset is much larger than the number of training samples of sub-dialects, resulting in the recognition performance of the model for standard Mandarin being much higher than that of sub-dialects. In this paper, to enhance the recognition performance for sub-dialects, we propose to reweight the recognition loss for different sub-dialects based on their similarity to standard Mandarin. The proposed reweighting method makes the model pay more attention to sub-dialects with larger loss weights, alleviating the problem of poor recognition performance for sub-dialects. Our model was trained and validated on an open-source dataset named KeSpeech, including standard Mandarin and its eight sub-dialects. Experimental results show that the proposed model is better at recognizing most sub-dialects than the baseline and is about 0.5 lower than the baseline in Character Error Rate.


Author Profile
Jiaju Wu

School of Software Engineering South China University of Technology Guangzhou China

China
Author Profile
Zhengchang Wen

Key Laboratory of Big Data and Intelligent Robot Ministry of Education Guangzhou China

Andorra
Author Profile
Haitian Huang

School of Software Engineering South China University of Technology Guangzhou China

China

📄 논문 정보

발행 연도 2024년
인용수 0
출판 국가 Andorra, China
사이트 Springer
좋아요 수 0

연관 논문 목록 (324건)