UPDF AI

VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks

Jing Yu Koh,Robert Lo,7 Autoren,Daniel Fried

2024 · DOI: 10.48550/arXiv.2401.13649
Annual Meeting of the Association for Computational Linguistics · 256 Zitierungen

TLDR

An extensive evaluation of state-of-the-art LLM-based autonomous agents, including several multimodal models are conducted, identifying several limitations of text-only LLM agents, and revealing gaps in the capabilities of state-of-the-art multimodal language agents.