ITRT(IT Research Trends)

Enhancing amazigh ASR through convolutional neural networks and MFCC

연구 분야: Artificial Intelligence

논문 키워드: #neural #audio #speakers #4200 #42

학회: Multimedia Tools and Applications

초록

In this study, we developed a speech recognition system for the Amazigh language, specifically targeting the recognition of the initial ten numbers. The system employs four Convolutional Neural Network (CNN) models, including three custom-designed models and a pre-trained VGG19 model. Our experiments utilized a dataset comprising 4200 audio files recorded by 42 distinct speakers, with input features extracted as Mel Frequency Cepstral Coefficients (MFCCs). We tested three normalization methods: no normalization, Cepstral Mean and Variance Normalization (CMVN), and Min-Max normalization. While CMVN generally provided effective standardization, We achieved the highest accuracy of 97.56% using Min-Max normalization with a specific filter size in the third custom CNN model. The VGG19 model, however, showed suboptimal performance. These findings underscore the significance of selecting suitable normalization techniques and model architectures for enhancing speech recognition accuracy.

📄 논문 정보

발행 연도	2024년
인용수	0
출판 국가	Morocco, Andorra
사이트	Springer
좋아요 수	0

Enhancing amazigh ASR through convolutional neural networks and MFCC

Enhancing amazigh ASR through convolutional neural networks and MFCC

Hossam Boulal

Mohamed Hamidi

Jamal Barkani

Mustapha Abarkan

📄 논문 정보

연관 논문 목록 (175건)

Enhancing amazigh ASR through convolutional neural networks and MFCC

Enhancing amazigh ASR through convolutional neural networks and MFCC

📄 논문 정보

연관 논문 목록 (175건) 내 서재 담기

연관 논문 목록 (175건)