-
Notifications
You must be signed in to change notification settings - Fork 138
How many hours did you take to train agents in each substrate? #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hello, Estimating training time is very difficult, since it entirely depends on the training stack, available compute, etc. There is typically a fundamental tradeoff between wall-clock time and compute. From our side, we have tried two very different training stacks, and one of them trained populations in a bit under a week, and in another stack it took just one day. The number of workers was also quite different in the two stacks. We recognise that compute is likely a limiting factor in training these population which is why we are actively working on improving the performance of the substrates, including reducing the time spent in Python, trying instead to delegate to the underlying C++ implementation of the substrate engine (Lab2D) as soon as possible. Hope this helps |
Dear @duenez, thanks for the detailed and helpful reply. I appreciate your team's efforts to make MeltingPot a great testbed in MARL research. |
@YetAnotherPolicy I am curious to know how long you take to train these populations! In my case, I can train 1e^6 steps in almost exactly an hour using 4 RLlib workers and a 64GB RAM machine with Rtx 3060 Nvidia GPU. |
@YetAnotherPolicy |
Hi in my case I use 32 workers and it will take 8 minutes to run 1M steps. Note that it depends on the simulation speed. |
I use very common Intel's CPUs, 40 in total. As the states are RGB images. I use A100, which can be faster than 3090. RAM is 256G. |
@YetAnotherPolicy Sorry! I am back with the questions! Which algorithm are you using to train? |
Hi, I use PPO. Note that there is an inner training loop in each update in PPO, see this link: https://github.com/openai/spinningup/blob/master/spinup/algos/pytorch/ppo/ppo.py#L265. Please also check if RLlib uses this trick. Training with PPO costs 1.5 days for 200M. |
Hello @YetAnotherPolicy, I got confused for your last message, i would like to know if for the training of the workers you used the RLlib library? |
Hi, @yesfon, I did not use RLlib. |
May I ask what did you use ? |
Hi, I use multiprocessing as well as ray's remote actor to collect data. RLlib is also good, but it takes a lot of time to learn its APIs. |
Dear authors,
Thanks for building such ambitious environments for MARL research. In your paper, I found it will take 10^9 steps to run the simulation for each agent. In order to train agents, how many rollout workers did you set and how many hours did you take to get the final results in Table 1: Focal per-capita returns?
Thank you.
The text was updated successfully, but these errors were encountered: