MMA: Multi-Modal Adapter for Vision-Language Models
MMA: Multi-Modal Adapter for Vision-Language Models
Lingxiao Yang,Ru-Yuan Zhang,Yanchen Wang,Xiaohua Xie
2024 · DOI: 10.1109/CVPR52733.2024.02249
Computer Vision and Pattern Recognition · 53 Citations
TLDR
A Multi-Modal Adapter for VLMs to improve the alignment between representations from text and vision branches and evaluates the effectiveness of the approach on three tasks: generalization to novel classes, novel target datasets, and domain generalization.
