ARMAP: Scaling Autonomous Agents via Automatic Reward Modeling And Planning

Large language models (LLMs) have demonstrated remarkable capabilities across a range of text-generation tasks. However, LLMs still struggle with problems requiring multi-step decision-making and environmental feedback. Unlike pure text data, collecting large-scale decision-making data is challenging. Moreover, many powerful LLMs are only accessible through APIs, which hinders their fine-tuning for agent tasks due to cost and complexity. To address LLM agents' limitations, we propose a framework that can automatically learn a reward model from the environment without human annotations. This model can be used to evaluate the action trajectories of LLM agents and provide heuristics for task planning. This reward model can be integrated with LLM-based agents and various planning algorithms to enhance task-solving performance, potentially revolutionizing the application of LLMs in complex and interactive environments. The effectiveness and generalizability of our framework are demonstrated through evaluations conducted on different agent benchmarks such as online shopping, scientific reasoning, mathematical problem-solving, house-holding and clinical scenario.

The pipeline of our ARMAP framework. (1) We first generate an initial task instruction using LLMs with in-context learning and sample trajectories aligned with the initial language instructions in the environment. (2) Next, we use the LLM to summarize the sampled trajectories and generate refined task instructions that better match these trajectories. (3) We then modify specific actions within the trajectories to perform new actions in the environment, collecting negative trajectories in the process. (4) Using the refined task instructions, along with both positive and negative trajectories, we train a lightweight reward model to distinguish between matching and non-matching trajectories. (5) The learned reward model can then collaborate with various LLM agents to improve task planning.

BibTeX

@misc{chen2025scalingautonomousagentsautomatic,
      title={Scaling Autonomous Agents via Automatic Reward Modeling And Planning}, 
      author={Zhenfang Chen and Delin Chen and Rui Sun and Wenjun Liu and Chuang Gan},
      year={2025},
      eprint={2502.12130},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2502.12130}, 
}

A R M A P
Scaling Autonomous Agents via
Automatic Reward Modeling And Planning

Abstract

🔭Overview

🎈Qualitative Results

BibTeX

A R M A P Scaling Autonomous Agents via Automatic Reward Modeling And Planning

Abstract

🔭Overview

🎈Qualitative Results

BibTeX

A R M A P
Scaling Autonomous Agents via
Automatic Reward Modeling And Planning