Skip to content

Implementing extrinsic rewards using MeltingPot #28

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ManuelRios18 opened this issue Apr 13, 2022 · 2 comments
Closed

Implementing extrinsic rewards using MeltingPot #28

ManuelRios18 opened this issue Apr 13, 2022 · 2 comments

Comments

@ManuelRios18
Copy link

Hello everybody,

I am trying to replicate the results of the paper Inequity aversion improves cooperation in
intertemporal social dilemmas
using MeltingPot. According with the tutorial that I have been following the reward that the agent receives is specified by the "Edible" component. Aditionally, in self_play.py I notice that Rllib handles the training completely in the trainer.train() method. Therefore, I am not sure which is the best way to implement the advantageous and disadvantageous inequity aversion since they need to punish or reward agents on each time step taking into account the reward that the other agents have received. Should I implement it using lua ? in a new substrate ?

Aditionally, When I run python3 meltingpot/python/human_players/play_clean_up.py --observation WORLD.RGB I notice that there are no apples in the field. Is this normal ? Will they appear as the agents clean the river?

Thanks in advance for your help and for providing this library!

Best regards.

@jzleibo
Copy link
Collaborator

jzleibo commented Apr 13, 2022

I would implement the inequity aversion reward function as part of the agent, not the environment. If you use self_play.py that would then mean using RLLib to implement it.

As for clean_up, yes it's normal that there are no apples present at the start. You have to clean the river to get them to appear. The growth settings were chosen so that it's easiest to get lots of apples to appear when two players clean at the same time. But if you move quickly with the human player you can still get it to work alone.

Good luck! Let us know how it goes. We're always happy to discuss.

@ManuelRios18
Copy link
Author

Thank you @jzleibo,

I will let you guys know when I have the code ready!

Once again thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants