UPDF AI

MMA: Multi-Modal Adapter for Vision-Language Models

Lingxiao Yang,Ru-Yuan Zhang,Yanchen Wang,Xiaohua Xie

2024 · DOI: 10.1109/CVPR52733.2024.02249
Computer Vision and Pattern Recognition · 53 Citations

TLDR

A Multi-Modal Adapter for VLMs to improve the alignment between representations from text and vision branches and evaluates the effectiveness of the approach on three tasks: generalization to novel classes, novel target datasets, and domain generalization.