Environmental sound classification using convolutional recurrent neural network and data augmentation


연구 분야: Artificial Intelligence



학회: Multimedia Tools and Applications


초록

Environmental sound classification(ESC) is the trending research area. ESC categorizes sounds such as dog barking, gunshots, and children playing in the surroundings. Due to overlapping sound signals, the presence of several audio sources while recording audio, and different distances from audio sources to the microphone make this problem complex. This study proposes a robust model for ESC, which can help in crime investigation systems, security warning systems, and the development of smart homes and hearing aids. Researchers have designed numerous frameworks for classifying surrounding events. Various techniques for ESC have been used in the past, but they are either computationally intensive or provide less accuracy. A hybrid model consisting of Convolutional Neural Network and Recurrent Neural Network for ESC is proposed to provide an accuracy of 99.89%, which is the highest till now, as far as we know. The model is a combination of both models; it is called CRNN. CRNN has already been used in a few past studies, but raw waveforms are used, and the accuracy attained is quite low. The publicly available Dataset UrbanSound8 K is used. Augmentation techniques are used to overcome the scarcity of datasets. The cepstral features are extracted and input to the CRNN. CRNN is encouraged due to its ability to capture spatial and temporal dependencies of environmental sound waves. Various hyperparameters, such as the number of LSTM layers, number of filters, batch size, momentum, and number of neurons in the LSTM layer, are altered to find the best value for hyperparameters for ESC. It is found that 0.5 momentum, 128 filters, 512 neurons in the LSTM layer, 256 batch size, and one LSTM layer give the highest accuracy. Another dataset, ESC- 10, is used to validate the model. It is found that the proposed model provides considerable accuracy for ESC- 10, even though it is lower than in the case of UrbanSound8 K. In the future, the model can be applied to different applications and datasets.


Author Profile
Anam Bansal

Computer Science and Engineering GZS Campus College of Engineering and Technology Maharaja Ranjit Singh Punjab Technical University Bathinda 151001 Punjab India

Andorra
Author Profile
Naresh Kumar Garg

Computer Science and Engineering GZS Campus College of Engineering and Technology Maharaja Ranjit Singh Punjab Technical University Bathinda 151001 Punjab India

Andorra

📄 논문 정보

발행 연도 2025년
인용수 0
출판 국가 Andorra
사이트 Springer
좋아요 수 0

연관 논문 목록 (187건)