UPDF AI

Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models

Hao Shao,Shengju Qian,5 Authors,Hongsheng Li

2024 · DOI: 10.48550/arXiv.2403.16999
arXiv.org · 102 Citations

TLDR

A novel pipeline that leverages the reasoning capabilities of multi-modal large language models (MLLMs) by incorporating visual Chain-of-Thought (CoT) reasoning and is capable of evaluating MLLMs in scenarios requiring specific local region identification is presented.

Cited Papers
Citing Papers