Is the implementation of the ipo algorithm in the project based on a single constraint? #258
-
Hello! Thank you very much for your contribution to this project for safe reinforcement learning! As a newcomer to security reinforcement learning, I would like to ask a question. Is the implementation of the ipo algorithm in the project based on a single constraint? If so, how should it be extended to the multi-constraint situation? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 5 replies
-
Thanks for your support to OmniSafe. Currently OmniSafe's IPO algorithm only supports a single constraint. If you need to use multiple constraints, you can try following steps:
We only provide the above suggestions based on our development experience. If you encounter other difficulties in the implementation process, please feel free to contact us in this discussion. |
Beta Was this translation helpful? Give feedback.
-
I have carefully reviewed your implementation and have not identified any logical issues in the code. As for your comment about the obtained results not meeting expectations, I am not aware of the specific aspects that are not aligned with your expectations. Could you clarify whether there are code errors or performance issues with the algorithm? If there are code errors, please provide the exact error messages. If there are performance issues, could you specify the environment in which the code is running, and whether there are reward and cost curve plots available? Please provide more information to enable us to better address your concerns. |
Beta Was this translation helpful? Give feedback.
-
I notice you mention that PPO failed to converge the cost. However, PPO is an unsafe algorithm which receive no signal of cost. That is, PPO can not reach |
Beta Was this translation helpful? Give feedback.
Thanks for your support to OmniSafe. Currently OmniSafe's IPO algorithm only supports a single constraint. If you need to use multiple constraints, you can try following steps:
cost_critic
hidden layer to the number of constraint types you need. (or use multiple single outputcost_critic
).omnisafe/adapter/on_policy_adapter
,omnisafe/env/wrapper
to receive multiple costs.IPO.py
file to multiple cost version. You can define multiplepenalty
in_compute_adv_surrogate
to match the multiple cost advantage function.We only provide the above…