Enhancing LLM Agent Effectiveness via Reflective Multi-Agent System

TLDR

This research highlights the effectiveness of a reflective multi-agent system in enhancing the overall results of LLM agents when performing tasks, and demonstrates that the approach outperforms individual agents on the ARC Challenge dataset.

Abstract

In the last couple of years, we have observed the rapid development of agent systems, which are incorporating Large Language Models (LLMs) as their core components to perform tasks such as content generation, task planning, and conversational actions. Reflection memory, a key component of agent systems, enables LLM agents to improve their results. This study presents a novel reflective multi-agent system designed to enhance the effectiveness of LLM agents. The solution utilizes N independent agents to generate diverse responses to user prompts (questions), which are then aggregated and analyzed by a decisionmaking agent to produce a final answer. The reflection mechanism is triggered by user feedback, enabling self-critique agents and accumulating diverse error patterns in their respective memories. Our experimental evaluation demonstrates that our approach outperforms individual agents on the ARC Challenge dataset. Our results reveal 56.85% for our solution, compared to an average of 54.83% for single agents with reflection memory, using the small, distilled model DeepSeek with 1.5 billion parameters. This research highlights the effectiveness of a reflective multi-agent system in enhancing the overall results of LLM agents when performing tasks.