Is the implementation of the ipo algorithm in the project based on a single constraint? #258

moodybluesf · 2023-07-28T09:13:49Z

moodybluesf
Jul 28, 2023

Hello! Thank you very much for your contribution to this project for safe reinforcement learning!

As a newcomer to security reinforcement learning, I would like to ask a question. Is the implementation of the ipo algorithm in the project based on a single constraint? If so, how should it be extended to the multi-constraint situation?

Answered by Gaiejj

Jul 28, 2023

Thanks for your support to OmniSafe. Currently OmniSafe's IPO algorithm only supports a single constraint. If you need to use multiple constraints, you can try following steps:

Set the output size of the cost_critic hidden layer to the number of constraint types you need. (or use multiple single output cost_critic).
Modify omnisafe/adapter/on_policy_adapter, omnisafe/env/wrapper to receive multiple costs.
Add the corresponding storage key value to the buffer.
Compute the advantage function for multiple constraints.
Adapt IPO.py file to multiple cost version. You can define multiple penalty in _compute_adv_surrogate to match the multiple cost advantage function.

We only provide the above…

View full answer

Gaiejj · 2023-07-28T16:12:36Z

Gaiejj
Jul 28, 2023
Maintainer

Thanks for your support to OmniSafe. Currently OmniSafe's IPO algorithm only supports a single constraint. If you need to use multiple constraints, you can try following steps:

Set the output size of the cost_critic hidden layer to the number of constraint types you need. (or use multiple single output cost_critic).
Modify omnisafe/adapter/on_policy_adapter, omnisafe/env/wrapper to receive multiple costs.
Add the corresponding storage key value to the buffer.
Compute the advantage function for multiple constraints.
Adapt IPO.py file to multiple cost version. You can define multiple penalty in _compute_adv_surrogate to match the multiple cost advantage function.

We only provide the above suggestions based on our development experience. If you encounter other difficulties in the implementation process, please feel free to contact us in this discussion.

3 replies

moodybluesf Jul 29, 2023
Author

Thank you for your reply. I have experimented with the idea you mentioned, but the result is not what I expected. The following is the calculation part of the advantage function I defined.

`def compute_adv_surrogates(adv_r, adv_cs, ep_costs):
penalties = np.zeros(len(ep_costs))
for i in range(len(ep_costs)):
penalty = hp.kappa / (hp.cost_limit - ep_costs[i] + 1e-8)
if penalty < 0 or penalty > hp.penalty_max:
penalty = hp.penalty_max
adv_r -= penalty * adv_cs[i]
penalties[i] = penalty

return adv_r / (1 + np.sum(penalties))`

Each ep_cost represents a constraint.

7tosmoke Nov 26, 2023

I'm also learning SRL. Does other SRL algorithm like CPPO or CUP have the version of Multi-constraints ? Hope for your reply.

Gaiejj Nov 27, 2023
Maintainer

All current OmniSafe algorithms do not officially support multiple constraints. A previous discussion #284 attempts to implement a multi-constraints version of the SAC-PID algorithm, which can be used as a reference for algorithms you mentioned to implement multi-constraints.

Gaiejj · 2023-07-29T07:25:38Z

Gaiejj
Jul 29, 2023
Maintainer

I have carefully reviewed your implementation and have not identified any logical issues in the code. As for your comment about the obtained results not meeting expectations, I am not aware of the specific aspects that are not aligned with your expectations. Could you clarify whether there are code errors or performance issues with the algorithm? If there are code errors, please provide the exact error messages. If there are performance issues, could you specify the environment in which the code is running, and whether there are reward and cost curve plots available? Please provide more information to enable us to better address your concerns.

1 reply

moodybluesf Jul 29, 2023
Author

Sorry for not being clear enough in my description.
My environment is a multi-uav environment and the goal is to have the uavs serve as many ground users as possible. constraints include that uavs cannot fly beyond boundaries, that uavs cannot collide with each other, and that uavs cannot collide with obstacles. The PPO algorithm used in the training process , and the cost curves fail to converge.
But the cost curve can converge when there is only one constraint, so I don't quite understand why.

Gaiejj · 2023-07-30T05:49:04Z

Gaiejj
Jul 30, 2023
Maintainer

I notice you mention that PPO failed to converge the cost. However, PPO is an unsafe algorithm which receive no signal of cost. That is, PPO can not reach cost, so I think it is reasonable.
I am willing to help you solve this problem. Can you provide some more information in training process, like curves, terminal log, progress.csv and others? You can also try PPOLag, which is a trustworthy baseline algorithm, to checkout if there something wrong with the multi-uav environment.

1 reply

moodybluesf Aug 2, 2023
Author

Sincerely thank you for your suggestions, but due to personal reasons, I cannot provide detailed training information. In fact, the algorithm I used is the ipo algorithm, which is improved on the basis of the ppo algorithm, so my previous description is not accurate. I double-checked the logic of the code and thought it was fine. I think the reason why the cost curves cannot converge is very likely that the ipo algorithm in the project does not support multi-constraint scenarios. So I'm looking forward to your release of a multi-constraint version of the algorithm.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is the implementation of the ipo algorithm in the project based on a single constraint? #258

{{title}}

Replies: 3 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Is the implementation of the ipo algorithm in the project based on a single constraint? #258

moodybluesf Jul 28, 2023

Replies: 3 comments · 5 replies

Gaiejj Jul 28, 2023 Maintainer

moodybluesf Jul 29, 2023 Author

7tosmoke Nov 26, 2023

Gaiejj Nov 27, 2023 Maintainer

Gaiejj Jul 29, 2023 Maintainer

moodybluesf Jul 29, 2023 Author

Gaiejj Jul 30, 2023 Maintainer

moodybluesf Aug 2, 2023 Author

moodybluesf
Jul 28, 2023

Replies: 3 comments 5 replies

Gaiejj
Jul 28, 2023
Maintainer

moodybluesf Jul 29, 2023
Author

Gaiejj Nov 27, 2023
Maintainer

Gaiejj
Jul 29, 2023
Maintainer

moodybluesf Jul 29, 2023
Author

Gaiejj
Jul 30, 2023
Maintainer

moodybluesf Aug 2, 2023
Author