연구 분야: Artificial Intelligence
학회: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Silent speech interfaces (SSIs) are systems that can enable speech communication in the partial or total absence of the acoustic speech signal. The research on SSIs includes command word recognition, speech synthesis and continuous phoneme/word recognition. The focus of this work is to perform continuous phoneme recognition with a radar-based SSI. Audio and radar data were recorded synchronously during continuous speech of one speaker. Six feature sets based on combinations of the magnitude spectrum, the phase spectrum, and the impulse response of the radar data were used as input to the recognition task and compared to each other and to a benchmark acoustic feature set based on the audio data. With a CNN-MLP followed by a Viterbi-based biphone phoneme decoder, the lowest mean phoneme error rate (PER) obtained by any radar-based feature set was 45.62%, which is 15% higher than the mean PER achieved by the benchmark acoustic feature set. The most common sources of errors were minimal pairs of consonants. The results of this work are promising and point towards future possible improvements for radar-based continuous phoneme recognition.
| 발행 연도 | 2025년 |
|---|---|
| 인용수 | 251 |
| 출판 국가 | Germany, Andorra, United States |
| 사이트 | IEEE |
| 좋아요 수 | 0 |