Transformers

Attention maps for the 'Drink.Frombottle' action on Toyota-Smarthome (CS)

Pose-guided token selection for the recognition of activities of daily living

In this paper we propose an improved token selection method that integrates semantic information from the ADL recognition task with that of human motion.

Latency-accuracy comparison of mobile based architectures tested on a Google Pixel 4 using 256×256 images as input

Efficiency Evaluation of Mobile Vision Transformers

In this paper we evaluate the efficiency of the most popular mobile vision transformer models in terms of latency and accuracy on ImageNet-1k.