연구 분야: Software Development
학회: Computer Graphics International Conference
Continuous Sign Language Recognition (CSLR) poses a formidable challenge due to the lack of accurate glosses on the temporal sequence of sign language data. The presence of voluminous and superfluous visual frame data complicates the attainment of satisfactory sign language recognition in intricate scenarios. To address the challenge of localizing key frames in sign language video and establishing temporal correlations among visual features, we employ image matting and temporal difference to identify keyframes with discernible motion trends. We introduce the Temporal Fusion Network (TFN) to amplify the temporal correlation among these keyframes and employ a Temporal Convolutional Network to model long-term dependencies. Additionally, we incorporate visual assist loss and decoded prediction loss for co-supervision, enhancing the feature extractor's training to mitigate overfitting. The proposed approach demonstrates competitive performance on two extensive Chinese continuous sign language recognition datasets (CSL and CSL-Daily).
| 발행 연도 | 2025년 |
|---|---|
| 인용수 | 0 |
| 출판 국가 | China |
| 사이트 | Springer |
| 좋아요 수 | 0 |