Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models
Hao Shao,Shengju Qian,5 Authors,Hongsheng Li
2024 · DOI: 10.48550/arXiv.2403.16999
arXiv.org · 102 Citations
TLDR
A novel pipeline that leverages the reasoning capabilities of multi-modal large language models (MLLMs) by incorporating visual Chain-of-Thought (CoT) reasoning and is capable of evaluating MLLMs in scenarios requiring specific local region identification is presented.
Cited Papers
Citing Papers
