You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to replicate the results of the paper Inequity aversion improves cooperation in
intertemporal social dilemmas using MeltingPot. According with the tutorial that I have been following the reward that the agent receives is specified by the "Edible" component. Aditionally, in self_play.py I notice that Rllib handles the training completely in the trainer.train() method. Therefore, I am not sure which is the best way to implement the advantageous and disadvantageous inequity aversion since they need to punish or reward agents on each time step taking into account the reward that the other agents have received. Should I implement it using lua ? in a new substrate ?
Aditionally, When I run python3 meltingpot/python/human_players/play_clean_up.py --observation WORLD.RGB I notice that there are no apples in the field. Is this normal ? Will they appear as the agents clean the river?
Thanks in advance for your help and for providing this library!
Best regards.
The text was updated successfully, but these errors were encountered:
I would implement the inequity aversion reward function as part of the agent, not the environment. If you use self_play.py that would then mean using RLLib to implement it.
As for clean_up, yes it's normal that there are no apples present at the start. You have to clean the river to get them to appear. The growth settings were chosen so that it's easiest to get lots of apples to appear when two players clean at the same time. But if you move quickly with the human player you can still get it to work alone.
Good luck! Let us know how it goes. We're always happy to discuss.
Hello everybody,
I am trying to replicate the results of the paper Inequity aversion improves cooperation in
intertemporal social dilemmas using MeltingPot. According with the tutorial that I have been following the reward that the agent receives is specified by the "Edible" component. Aditionally, in self_play.py I notice that Rllib handles the training completely in the
trainer.train()
method. Therefore, I am not sure which is the best way to implement the advantageous and disadvantageous inequity aversion since they need to punish or reward agents on each time step taking into account the reward that the other agents have received. Should I implement it using lua ? in a new substrate ?Aditionally, When I run
python3 meltingpot/python/human_players/play_clean_up.py --observation WORLD.RGB
I notice that there are no apples in the field. Is this normal ? Will they appear as the agents clean the river?Thanks in advance for your help and for providing this library!
Best regards.
The text was updated successfully, but these errors were encountered: