UPDF AI

Self-supervised Models are Good Teaching Assistants for Vision Transformers

Haiyan Wu,Yuting Gao,4 저자,Ke Li

2022 · DBLP: conf/icml/WuGZL0SL22
International Conference on Machine Learning · 22회 인용

TLDR

A head-level knowledge distillation method that selects the most important head of the supervised teacher and self-supervised teaching assistant and let the student mimic the attention distribution of these two heads, so as to make the student focus on the relationship between tokens deemed by the teacher and the teacher assistant.