EPO: Hierarchical LLM Agents with Environment Preference Optimization

Qi Zhao*, Haotian Fu*, Chen Sun, George Konidaris

EMNLP 2024

Long-horizon decision-making tasks present significant challenges for LLM-based agents due to the need for extensive planning over multiple steps. In this paper, we propose a hierarchical framework that decomposes complex tasks into manageable subgoals, utilizing separate LLMs for subgoal prediction and low-level action generation. To address the challenge of creating training signals for unannotated datasets, we develop a reward model that leverages multimodal environment feedback to automatically generate reward signals. We introduce Environment Preference Optimization (EPO), a novel method that generates preference signals from the environment's feedback and uses them to train LLM-based agents. Extensive experiments on ALFRED demonstrate the state-of-the-art performance of our framework, achieving first place on the ALFRED public leaderboard and showcasing its potential to improve long-horizon decision-making in diverse environments.

Setup

Fist setup ALFRED first following E.T.

The setup this repo using commands below:

git clone https://github.com/kevinz8866/EPO

cd EPO

pip install -r requirements.txt

Agent Framework

Please check out the example configurations in /configs.

The launch command is

python -m run --cfg configs/example_policy.yaml

Please note that implementation for modules such as agent exploration, ALFRED interaction, etc are not currently included.

EPO

A demonstration is available in /epo_demo.

This EPO trainer demo is modified from the DPO Trainer implemented by huggingface.

Our Paper

Our paper is available on Arxiv. If you find our work useful, please consider citing us.

@article{zhao2024epo,
  title   = {EPO: Hierarchical LLM Agents with Environment Preference Optimization},
  author  = {Qi Zhao and Haotian Fu and Chen Sun and George Konidaris},
  journal = {The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year    = {2024}
}

License

This project is released under the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
configs		configs
datasets		datasets
epo_demo		epo_demo
models		models
optimizers		optimizers
tasks		tasks
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
defaults.py		defaults.py
parser.py		parser.py
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EPO: Hierarchical LLM Agents with Environment Preference Optimization

Contents

Setup

Agent Framework

EPO

Our Paper

License

About

Uh oh!

Contributors 2

Uh oh!

Languages

License

kevinz8866/EPO

Folders and files

Latest commit

History

Repository files navigation

EPO: Hierarchical LLM Agents with Environment Preference Optimization

Contents

Setup

Agent Framework

EPO

Our Paper

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages