연구 분야: Artificial Intelligence
학회: Multimedia Tools and Applications
In this study, we developed a speech recognition system for the Amazigh language, specifically targeting the recognition of the initial ten numbers. The system employs four Convolutional Neural Network (CNN) models, including three custom-designed models and a pre-trained VGG19 model. Our experiments utilized a dataset comprising 4200 audio files recorded by 42 distinct speakers, with input features extracted as Mel Frequency Cepstral Coefficients (MFCCs). We tested three normalization methods: no normalization, Cepstral Mean and Variance Normalization (CMVN), and Min-Max normalization. While CMVN generally provided effective standardization, We achieved the highest accuracy of 97.56% using Min-Max normalization with a specific filter size in the third custom CNN model. The VGG19 model, however, showed suboptimal performance. These findings underscore the significance of selecting suitable normalization techniques and model architectures for enhancing speech recognition accuracy.
| 발행 연도 | 2024년 |
|---|---|
| 인용수 | 0 |
| 출판 국가 | Morocco, Andorra |
| 사이트 | Springer |
| 좋아요 수 | 0 |