VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks
VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks
Jing Yu Koh,Robert Lo,7 Autoren,Daniel Fried
2024 · DOI: 10.48550/arXiv.2401.13649
Annual Meeting of the Association for Computational Linguistics · 256 Zitierungen
TLDR
An extensive evaluation of state-of-the-art LLM-based autonomous agents, including several multimodal models are conducted, identifying several limitations of text-only LLM agents, and revealing gaps in the capabilities of state-of-the-art multimodal language agents.
