-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Proof tx submission enhancement/bug #1574
Add Proof tx submission enhancement/bug #1574
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple comments - no blockers.
Appreciate the explanation in the issue. I understand it looks safe but trying to be "ultra careful" with any last minute changes.
Adding @msmania for a review as well.
@@ -50,13 +50,16 @@ func (k Keeper) SendProofTx(ctx sdk.Ctx, n client.Client, node *pc.PocketNode, p | |||
ctx.Logger().Error(fmt.Sprintf("could not delete evidence is not sealed, could cause a relay leak: %s", err.Error())) | |||
} | |||
} | |||
if evidence.NumOfProofs != claim.TotalProofs { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PoktBlade Have you looked into the difficulty of adding a unit test for this? I realize it's not easy, but just wanted to make sure that we at least put effort into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can look into it some more for sanity purposes.
In regards to some other testing, @jorgecuesta did run it live in his fleet, and saw that there weren't any invalid proofs being submitted on chain. This was a dramatic decrease (0 invalid proofs) vs what he saw before the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: I wrote some integration tests last night based off TestClaimProtoTx
, but it turns out the test for this was already broken.. The reason why we don't see them when running tests, these tests are skipped whenever -s flag is passed.
Anyhow.. will see if I can fix the initial broken tests (seems like a crashing error when submitting proofs, due to AIOB)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PoktBlade Bump on existing tests and/or new tests.
x/pocketcore/keeper/proof.go
Outdated
@@ -73,8 +76,24 @@ func (k Keeper) SendProofTx(ctx sdk.Ctx, n client.Client, node *pc.PocketNode, p | |||
if !found { | |||
ctx.Logger().Error(fmt.Sprintf("an error occurred creating the proof transaction with app %s not found with evidence %v", evidence.ApplicationPubKey, evidence)) | |||
} | |||
|
|||
// There is a potential race condition where the evidence is not sealed while submitting a claim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any idea why this wasn't a major concern in the past?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally whenever the claim is sealed, relays stop coming in and so there is some non deterministic timing in which the race condition could happen. If relays keep coming in (i.e session rollover), the problem becomes more prominent.
We did find some on chain tx's to show that this did periodically happen even without session rollover though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can go here and filter the transactions by "Proof" and "Failed", and put the start date before ~ the July 10th and it will show some of the failed submissions even with session rollover disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was running into some issues but reached out to poktscan here: https://discord.com/channels/854406364931686400/950799595171086377/1131367539667107872
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks good. Can you update the comments for better readability?
@@ -50,13 +50,16 @@ func (k Keeper) SendProofTx(ctx sdk.Ctx, n client.Client, node *pc.PocketNode, p | |||
ctx.Logger().Error(fmt.Sprintf("could not delete evidence is not sealed, could cause a relay leak: %s", err.Error())) | |||
} | |||
} | |||
if evidence.NumOfProofs != claim.TotalProofs { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PoktBlade Bump on existing tests and/or new tests.
x/pocketcore/keeper/proof.go
Outdated
@@ -73,8 +76,24 @@ func (k Keeper) SendProofTx(ctx sdk.Ctx, n client.Client, node *pc.PocketNode, p | |||
if !found { | |||
ctx.Logger().Error(fmt.Sprintf("an error occurred creating the proof transaction with app %s not found with evidence %v", evidence.ApplicationPubKey, evidence)) | |||
} | |||
|
|||
// There is a potential race condition where the evidence is not sealed while submitting a claim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was running into some issues but reached out to poktscan here: https://discord.com/channels/854406364931686400/950799595171086377/1131367539667107872
Added all the comments as suggested! Thanks for the review. In regards to the tests, I haven't made any progress to them - don't think I will have the bandwidth moving forward to diagnose / fix the integration tests. The changes here are minimal and only guard rails for proof tx, not consensus. If needed, we can push an additional hot fix without the need for consensus upgrades. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to block this PR until tests are fixed & added per this comment: #1574 (comment)
@PoktBlade Could you do a handoff (whether it's to myself, a core team member or someone from the community) with regard to the status of the tests you investigated? For example:
- Which tests exist for it (e.g. names & links)
- Which ones are broken (e.g. names & links)
- Insight/ideas into why certain tests are broken (e.g. a few bullet points)
- Which ones would you add if you had capacity (e.g. a list)
I'm handing it off as is. The PR is already merged into Poktscan's fork. |
Since I don't have time to dedicate more to this issue, please cherry pick or fork my changes and move forward. Closing since it won't be merged in / blocked. |
@Olshansk all the request you ask about test are not fair to be handle by the community people trying to help if pocket has people that get a salary for this. |
Thanks for the feedback. I will work on adding the tests myself this time and reach out about getting a budget for it in the future. |
Description
Whenever the evidence does not match the total proofs, the node tries to delete the evidence and submits a claim. It should never submit a proof whenever there is LESS than the number of relay proofs as that can cause an array index of out-bounds exception if the randomly selected index is more than what we have in the store. This can cause a node crash.
Example: Servicer starts off with 10 relays in the evidence store
Solution:
If the number of relay proofs in the evidence store is less than the submitted claims total relay proofs then delete the evidence and don't submit a proof.
There is a potential race condition where the evidence is not sealed while submitting a claim, allowing for more relays to enter the evidence claim. Whenever a proof is generated via
GenerateMerkleRoot
, it will result in an incorrect Merkle proof from being generated due to a mismatch of how many relays are in the store and how many was submitted on chain. This can inadvertently happen more often as we accept session rollover relaysExample: Servicer starts off with 10 relays in the evidence store
Solution
maxRelays
based off:The
GenerateMerkleRoot
already accepts amaxRelays
parameter which was introduced to fix the Chocalate Rain/Overservicing vulnerability. It discards relays above maxRelays with the following implementationNote: The more ideal solution is to only submit the claim whenever we are done servicing for sure, but this involves locks and potential side effects that Pocket core was never really built with in mind
Note: Something tells me that we were missing a
continue
statement in the old code whenever the evidence does not match the total proofs in the claim. It shouldn't submit if there is a mismatch / if we deleted the evidence. My solution adds acontinue
for less than, and tries a best effort to submit proof if we do have enough relay proofs in our evidence store.Summary generated by Reviewpad on 17 Jul 23 19:44 UTC
This pull request adds better proof validation to the
proof.go
file in thex/pocketcore/keeper
package. It includes changes to handle a potential race condition where the evidence is not sealed while submitting a claim, allowing for more relays to enter the evidence claim. It also generates a merkle proof for the claim's total proof count if it doesn't exceed the maximum number of relays per session, otherwise it falls back to the max relays per session. Additionally, it includes validation of the level count on the claim by the total relays.