Attention maps for the 'Drink.Frombottle' action on Toyota-Smarthome (CS)

Pose-guided token selection for the recognition of activities of daily living

In this paper we propose an improved token selection method that integrates semantic information from the ADL recognition task with that of human motion.

July 2025 · Ricardo Pizarro, Roberto Valle, José Miguel Buenaposada, Luis Miguel Bergasa, Luis Baumela
Latency-accuracy comparison of mobile based architectures tested on a Google Pixel 4 using 256×256 images as input

Efficiency Evaluation of Mobile Vision Transformers

In this paper we evaluate the efficiency of the most popular mobile vision transformer models in terms of latency and accuracy on ImageNet-1k.

February 2024 · Juan Castrillo, Roberto Valle, Luis Baumela