
Pose-guided token selection for the recognition of activities of daily living
In this paper we propose an improved token selection method that integrates semantic information from the ADL recognition task with that of human motion.
In this paper we propose an improved token selection method that integrates semantic information from the ADL recognition task with that of human motion.
In this paper we report the existence of data set biases in the most widely used head pose estimation benchmarks, which lead to an optimistic estimation of model performance in real-world scenarios.
In this paper we propose a multi-task network which leverages spatiotemporal features extracted from video inputs to provide more robust predictions compared to image-only models.
In this paper we evaluate the efficiency of the most popular mobile vision transformer models in terms of latency and accuracy on ImageNet-1k.
In this paper we analyze the methodology for short- and wide-range HPE and discuss which representations and metrics are adequate for each case.
We show that the combination of head pose estimation and landmark-based face alignment significantly improve the performance of the former task.
In this paper we investigate the use of a cascade of Neural Net regressors to increase the accuracy of the estimated facial landmarks.
In this paper we present a robust and efficient face alignment algorithm based on a coarse-to-fine cascade of ensembles of regression trees.
In this paper we investigate the use of a cascade of CNN regressors to make the set of estimated landmarks lie closer to a valid face shape.
In this paper we present a real-time facial landmark regression method based on a coarse-to-fine Ensemble of Regression Trees (ERT).
In this paper we present a real-time facial landmark regression method based on a coarse-to-fine Ensemble of Regression Trees (ERT).
In this paper we present a real-time algorithm that estimates the head-pose from unrestricted 2D gray-scale images.