TACR-Net: Editing on Deep Video and Voice Portraits


연구 분야: Strategies



학회: MM '21: Proceedings of the 29th ACM International Conference on Multimedia


초록

Utilizing an arbitrary speech clip to edit the mouth of the portrait in the target video is a novel yet challenging task. Despite impressive results have been achieved, there are still three limitations in the existing methods: 1) since the acoustic features are not completely decoupled from person identity, there is no global speech to facial features (i.e., landmarks, expression blendshape) mapping method. 2) the audio-driven talking face sequences generated by simple cascade structure usually lack of temporal consistency and spatial correlation, which leads to defects in the consistency of changes in details. 3) the operation of forgery is always at the video level, without considering the forgery of the voice, especially the synchronization of the converted voice and the mouth. To address these distortion problems, we propose a novel deep learning framework, named Temporal-Refinement Autoregressive-Cascade Rendering Network (TACR-Net) for audio-driven dynamic talking face editing. The proposed TACR-Net encodes facial expression blendshape based on the given acoustic features without separately training for special video. Then TACR-Net also involves a novel autoregressive cascade structure generator for video re-rendering. Finally, we transform the in-the-wild speech to the target portrait and obtain a photo-realistic and audio-realistic video.


Author Profile
Luchuan Song

University of Science and Technology of China Hefei China

Andorra
Author Profile
Bin Liu

University of Science and Technology of China Hefei China

Andorra
Author Profile
Guojun Yin

University of Science and Technology of China Hefei China

Andorra

📄 논문 정보

발행 연도 2021년
인용수 15
출판 국가 Andorra, China
사이트 ACM
좋아요 수 0

연관 논문 목록 (66건)