Non-invasive Speaker-dependent Continuous Phoneme Recognition with a Radar-based Silent Speech Interface


연구 분야: Artificial Intelligence



학회: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)


초록

Silent speech interfaces (SSIs) are systems that can enable speech communication in the partial or total absence of the acoustic speech signal. The research on SSIs includes command word recognition, speech synthesis and continuous phoneme/word recognition. The focus of this work is to perform continuous phoneme recognition with a radar-based SSI. Audio and radar data were recorded synchronously during continuous speech of one speaker. Six feature sets based on combinations of the magnitude spectrum, the phase spectrum, and the impulse response of the radar data were used as input to the recognition task and compared to each other and to a benchmark acoustic feature set based on the audio data. With a CNN-MLP followed by a Viterbi-based biphone phoneme decoder, the lowest mean phoneme error rate (PER) obtained by any radar-based feature set was 45.62%, which is 15% higher than the mean PER achieved by the benchmark acoustic feature set. The most common sources of errors were minimal pairs of consonants. The results of this work are promising and point towards future possible improvements for radar-based continuous phoneme recognition.


Author Profile
João Menezes

Inst. of Acoustics and Speech Communication Dresden University of Technology Dresden Germany

Andorra
Author Profile
Christoph Wagner

Inst. of Acoustics and Speech Communication Dresden University of Technology Dresden Germany

Andorra
Author Profile
Peter Steiner

Plasma Control Group Princeton University Princeton USA

United States

📄 논문 정보

발행 연도 2025년
인용수 251
출판 국가 Germany, Andorra, United States
사이트 IEEE
좋아요 수 0

연관 논문 목록 (45건)