Enhancing amazigh ASR through convolutional neural networks and MFCC


연구 분야: Artificial Intelligence



학회: Multimedia Tools and Applications


초록

In this study, we developed a speech recognition system for the Amazigh language, specifically targeting the recognition of the initial ten numbers. The system employs four Convolutional Neural Network (CNN) models, including three custom-designed models and a pre-trained VGG19 model. Our experiments utilized a dataset comprising 4200 audio files recorded by 42 distinct speakers, with input features extracted as Mel Frequency Cepstral Coefficients (MFCCs). We tested three normalization methods: no normalization, Cepstral Mean and Variance Normalization (CMVN), and Min-Max normalization. While CMVN generally provided effective standardization, We achieved the highest accuracy of 97.56% using Min-Max normalization with a specific filter size in the third custom CNN model. The VGG19 model, however, showed suboptimal performance. These findings underscore the significance of selecting suitable normalization techniques and model architectures for enhancing speech recognition accuracy.


Author Profile
Hossam Boulal

LSI Laboratory FP Taza USMBA University Taza Morocco

Morocco
Author Profile
Mohamed Hamidi

Team of modeling and scientific computing FPN UMP Nador Morocco

Andorra
Author Profile
Jamal Barkani

LSI Laboratory FP Taza USMBA University Taza Morocco

Morocco

📄 논문 정보

발행 연도 2024년
인용수 0
출판 국가 Morocco, Andorra
사이트 Springer
좋아요 수 0

연관 논문 목록 (175건)