Self-supervised Models are Good Teaching Assistants for Vision Transformers
Self-supervised Models are Good Teaching Assistants for Vision Transformers
Haiyan Wu,Yuting Gao,4 저자,Ke Li
2022 · DBLP: conf/icml/WuGZL0SL22
International Conference on Machine Learning · 22회 인용
TLDR
A head-level knowledge distillation method that selects the most important head of the supervised teacher and self-supervised teaching assistant and let the student mimic the attention distribution of these two heads, so as to make the student focus on the relationship between tokens deemed by the teacher and the teacher assistant.
