Cascaded Iterative Transformer for Jointly Predicting Facial Landmark, Occlusion Probability and Head Pose


연구 분야: Strategies



학회: International Journal of Computer Vision


초록

Landmark detection under large pose with occlusion has been one of the challenging problems in the field of facial analysis. Recently, many works have predicted pose or occlusion together in the multi-task learning (MTL) paradigm, trying to tap into their dependencies and thus alleviate this issue. However, such implicit dependencies are weakly interpretable and inconsistent with the way humans exploit inter-task coupling relations, i.e., accommodating the induced explicit effects. This is one of the essentials that hinders their performance. To this end, in this paper, we propose a Cascaded Iterative Transformer (CIT) to jointly predict facial landmark, occlusion probability, and pose. The proposed CIT, besides implicitly mining task dependencies in a shared encoder, innovatively employs a cost-effective and portability-friendly strategy to pass the decoders’ predictions as prior knowledge to human-like exploit the coupling-induced effects. Moreover, to the best of our knowledge, no dataset contains all these task annotations simultaneously, so we introduce a new dataset termed MERL-RAV-FLOP based on the MERL-RAV dataset. We conduct extensive experiments on several challenging datasets (300W-LP, AFLW2000-3D, BIWI, COFW, and MERL-RAV-FLOP) and achieve remarkable results. The code and dataset can be accessed in https://github.com/Iron-LYK/CIT.


Author Profile
Yaokun Li

School of Intelligent Systems Engineering Shenzhen Campus of Sun Yat-sen University Shenzhen 518107 China

China
Author Profile
Guang Tan

School of Intelligent Systems Engineering Shenzhen Campus of Sun Yat-sen University Shenzhen 518107 China

China
Author Profile
Chao Gou

School of Intelligent Systems Engineering Shenzhen Campus of Sun Yat-sen University Shenzhen 518107 China

China

📄 논문 정보

발행 연도 2023년
인용수 0
출판 국가 China
사이트 Springer
좋아요 수 0

연관 논문 목록 (59건)