Repository for Darwin Gödel Machine (DGM), a novel self-improving system that iteratively modifies its own code (thereby also improving its ability to modify its own codebase) and empirically validates each change using coding benchmarks.
# API keys, add to ~/.bashrc
export OPENAI_API_KEY='...'
export ANTHROPIC_API_KEY='...'
# Verify that Docker is properly configured in your environment.
docker run hello-world
# If a permission error occurs, add the user to the Docker group
sudo usermod -aG docker $USER
newgrp docker
# Install dependencies
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Optional: for running analysis
sudo apt-get install graphviz graphviz-dev
pip install -r requirements_dev.txt
# Clone SWE-bench
cd swe_bench
git clone https://github.com/princeton-nlp/SWE-bench.git
cd SWE-bench
git checkout dc4c087c2b9e4cefebf2e3d201d27e36
pip install -e .
cd ../../
# Prepare Polyglot
# Make sure git is properly configured in your environment with username and email
python polyglot/prepare_polyglot_dataset.py
python DGM_outer.py
By default, outputs will be saved in the output_dgm/
directory.
analysis/
scripts used for plotting and analysisinitial/
SWE-bench logs and performance of the initial agentinitial_polyglot/
Polyglot logs and performance of the initial agentswe_bench/
code needed for SWE-bench evaluationpolyglot/
code needed for Polyglot evaluationprompts/
prompts used for foundation modelstests/
tests for the DGM systemtools/
tools available to the foundation modelscoding_agent.py
main implementation of the initial coding agentDGM_outer.py
entry point for running the DGM algorithm
This google drive folder contains all the foundation model output logs from the experiments shown in the paper.
Warning
This repository involves executing untrusted, model-generated code. We strongly advise users to be aware of the associated safety risks. While it is highly unlikely that such code will perform overtly malicious actions under our current settings and with the models we use, it may still behave destructively due to limitations in model capability or alignment. By using this repository, you acknowledge and accept these risks.
The evaluation framework implementations are based on the SWE-bench and polyglot-benchmark repositories.
If you find this project useful, please consider citing:
@article{zhang2025darwin,
title={Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents},
author={Zhang, Jenny and Hu, Shengran and Lu, Cong and Lange, Robert and Clune, Jeff},
journal={arXiv preprint arXiv:2505.22954},
year={2025}
}