Inappropriate Pause Detection in Dysarthric Speech Using Large-Scale Speech Recognition


연구 분야: Artificial Intelligence



학회: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)


초록

Dysarthria, a common issue among stroke patients, severely impacts speech intelligibility. Inappropriate pauses are crucial indicators in severity assessment and speech-language therapy. We propose to extend a large-scale speech recognition model for inappropriate pause detection in dysarthric speech. To this end, we propose task design, labeling strategy, and a speech recognition model with an inappropriate pause prediction layer. First, we treat pause detection as speech recognition, using an automatic speech recognition (ASR) model to convert speech into text with pause tags. According to the newly designed task, we label pause locations at the text level and their appropriateness. We collaborate with speech-language pathologists to establish labeling criteria, ensuring high-quality annotated data. Finally, we extend the ASR model with an inappropriate pause prediction layer for end-to-end inappropriate pause detection. Moreover, we propose a task-tailored metric for evaluating inappropriate pause detection independent of ASR performance. Our experiments show that the proposed method better detects inappropriate pauses in dysarthric speech than baselines. (Inappropriate Pause Error Rate: 14.47%)


Author Profile
Jeehyun Lee

Department of Artificial Intelligence Sogang University Seoul South Korea

Korea
Author Profile
Yerin Choi

Department of Artificial Intelligence Sogang University Seoul South Korea

Korea
Author Profile
Tae-Jin Song

Department of Neurology Seoul Hospital Ewha Womans University College of Medicine Seoul Republic of Korea

Korea

📄 논문 정보

발행 연도 2024년
인용수 4
출판 국가 Korea
사이트 IEEE
좋아요 수 0

연관 논문 목록 (35건)