Focal Attention for Long-Range Interactions in Vision Transformers

TLDR

A new variant of Vision Transformer models, called Focal Transformers, is built, which achieve superior performance over the state-of-the-art (SoTA) Vision Transformers on a range of public image classiﬁcation and object detection benchmarks.