Speaker age and gender recognition using 1D and 2D convolutional neural networks


연구 분야: Artificial Intelligence



학회: Neural Computing and Applications


초록

The speech signal is one of the most effective data sources used in human–computer interaction and is widely used in many applications such as speech/speaker recognition, emotion recognition, language recognition, and age and gender recognition. In this study, two convolutional neural networks, 1D and 2D, are designed to recognize the age and gender class of the speaker. These models are created by stacking four feature learning blocks (FLBs) and one classification block. Two different feature vectors are used in their inputs, which are formed with mel-frequency cepstrum coefficients. Each FLB consists of a convolution layer, a batch normalization layer, a ReLU layer, a max pooling layer, and a dropout layer, while the classification block consists of a flatten layer, two fully connected layers, and a softmax layer. In the study, besides the parameter optimization made by manual search method, model optimization is also carried out by trying different combinations of the basic components that make up the FLBs. In the experiments with the Common Voice Turkish dataset, the highest validation accuracy is obtained as 66.26% for the 1D model and 94.40% for the 2D model. These results reveal the effectiveness of the proposed 2D model in age and gender recognition.


Author Profile
Ergün Yücesoy

Vocational School of Technical Sciences Ordu University Ordu 52200 Turkey

Turkey

📄 논문 정보

발행 연도 2023년
인용수 9
출판 국가 Turkey
사이트 Springer
좋아요 수 0

연관 논문 목록 (423건)