HiViT: A Simpler and More Efficient Design of Hierarchical Vision Transformer
Xiaosong Zhang,Yunjie Tian,4 Authors,Qi Tian
2023 · DBLP: conf/iclr/0004TXHDY023
International Conference on Learning Representations · 62 Citations
TLDR
A new architecture named HiViT (short for hierarchical ViT, which is simpler and more efficient than Swin yet further improves its performance on fully-supervised and self-supervised visual representation learning), after pre-trained using masked autoencoder on ImageNet-1K.
Cited Papers
Citing Papers
