Continuous sign language recognition using intra-inter gloss attention


연구 분야: Software Development



학회: Multimedia Tools and Applications


초록

Many continuous sign language recognition (CSLR) studies adopt transformer-based architectures for sequence modeling due to their powerful capacity for capturing global contexts. Nevertheless, vanilla self-attention, which serves as the core module of the transformer, performs context-aware weighted aggregation over all time steps. Therefore, the local temporal semantics of sign videos may not be fully exploited. This study introduces a novel module for sign language recognition, the intra-inter gloss attention module, designed to leverage relationships among frames within glosses and to capture the semantic and grammatical dependencies between glosses in the video. In the intra-gloss attention module, the video is divided into equally sized chunks, and a self-attention mechanism is applied within each chunk. This localized self-attention significantly reduces complexity and eliminates noise introduced by considering non-relative frames, and achieves a 0.6 improvement in WER based on the baseline model while increasing inference speed by 20%. In the inter-gloss attention module, the chunk-level features within each gloss chunk are initially aggregated by average pooling along the temporal dimension, which leads to a 0.7 improvement in WER. Subsequently, multi-head self-attention is applied to all chunk-level features. Given the non-significance of the signer-environment interaction, we utilize segmentation module to remove the background of the videos. This enables the proposed model to direct its focus toward the signer and achieve an additional 1.5 improvement in WER. Experimental results on the PHOENIX-2014 benchmark dataset demonstrate that our method can effectively extract sign language features in an end-to-end manner without any prior knowledge, improve the accuracy of CSLR, and achieve the word error rate (WER) of 20.4 on the test set which is a competitive result compare to the state-of-the-art which uses additional supervisions.


Author Profile
Hossein Ranjbar

Social and Cognitive Robotics Lab. Mechanical Engineering Department Sharif University of Technology Tehran Iran

Andorra
Author Profile
Alireza Taheri

Social and Cognitive Robotics Lab. Mechanical Engineering Department Sharif University of Technology Tehran Iran

Andorra

📄 논문 정보

발행 연도 2025년
인용수 3
출판 국가 Andorra
사이트 Springer
좋아요 수 0

연관 논문 목록 (16건)