Multi-Task

Ground-truth motion heatmap for different number of channels

Spatiotemporal Face Alignment for Generalizable Deepfake Detection

In this paper we propose a multi-task network which leverages spatiotemporal features extracted from video inputs to provide more robust predictions compared to image-only models.

Simultaneous head pose estimation, facial landmark location and their visibility predictions when processing a video from 300VW

Multi-task head pose estimation in-the-wild

We show that the combination of head pose estimation and landmark-based face alignment significantly improve the performance of the former task.