UPDF AI

Rethinking the Self-Attention in Vision Transformers

Kyungmin Kim,Bichen Wu,4 Authors,Seon Joo Kim

2021 · DOI: 10.1109/CVPRW53098.2021.00342
38 Citations

TLDR

This analysis shows that self-attention in vision transformer inference is extremely sparse, and motivates us to rethink the role of self-Attention in Vision transformer models.

Abstract

Self-attention is a corner stone for transformer models. However, our analysis shows that self-attention in vision transformer inference is extremely sparse. When applying a sparsity constraint, our experiments on image (ImageNet- 1K) and video (Kinetics-400) understanding show we can achieve 95% sparsity on the self-attention maps while main-taining the performance drop to be less than 2 points. This motivates us to rethink the role of self-attention in vision transformer models.

Cited Papers
Citing Papers