Add Proof tx submission enhancement/bug #1574

nodiesBlade · 2023-07-17T19:44:36Z

Description

Whenever the evidence does not match the total proofs, the node tries to delete the evidence and submits a claim. It should never submit a proof whenever there is LESS than the number of relay proofs as that can cause an array index of out-bounds exception if the randomly selected index is more than what we have in the store. This can cause a node crash.

Example: Servicer starts off with 10 relays in the evidence store

1. Session ends
2. Servicer submits claim for 10 relays on chain
4. Servicer stops node and restarts evidence.db & evidence.db gets corrupted
5. Servicer will attempt to submit a proof, and psuedo-random selected index for merkle proof generation can result in AIOB

Solution:
If the number of relay proofs in the evidence store is less than the submitted claims total relay proofs then delete the evidence and don't submit a proof.

There is a potential race condition where the evidence is not sealed while submitting a claim, allowing for more relays to enter the evidence claim. Whenever a proof is generated via GenerateMerkleRoot, it will result in an incorrect Merkle proof from being generated due to a mismatch of how many relays are in the store and how many was submitted on chain. This can inadvertently happen more often as we accept session rollover relays

Example: Servicer starts off with 10 relays in the evidence store

1. Session ends
2. Servicer handles a relay, while also submitting the claim (RACE Condition, no proper locking mechanism here)
3. Servicer submits claim for 10 relays on chain
4. Servicer finishes handling the relay and evidence store now contains 11 relays
5. Servicer (after waiting a couple of blocks) submits a proof tx with a merkle proof for 11 relays, not 10
6. Validators will mark the TX as invalid due to invalid merkle proof.

Solution

In order to allow for a best-effort process to submit proof for the claim the servicer submitted, we will calculate the maxRelays based off:
1. Use the claim's total proof if the proof count doesn't exceed the maximum relays per session.
2. Otherwise, fallback to the maximum relays per session.

The GenerateMerkleRoot already accepts a maxRelays parameter which was introduced to fix the Chocalate Rain/Overservicing vulnerability. It discards relays above maxRelays with the following implementation

	if int64(len(ev.Proofs)) > maxRelays {
		ev.Proofs = ev.Proofs[:maxRelays]
		ev.NumOfProofs = maxRelays
	}

Note: The more ideal solution is to only submit the claim whenever we are done servicing for sure, but this involves locks and potential side effects that Pocket core was never really built with in mind

Note: Something tells me that we were missing a continue statement in the old code whenever the evidence does not match the total proofs in the claim. It shouldn't submit if there is a mismatch / if we deleted the evidence. My solution adds a continue for less than, and tries a best effort to submit proof if we do have enough relay proofs in our evidence store.

Summary generated by Reviewpad on 17 Jul 23 19:44 UTC

This pull request adds better proof validation to the proof.go file in the x/pocketcore/keeper package. It includes changes to handle a potential race condition where the evidence is not sealed while submitting a claim, allowing for more relays to enter the evidence claim. It also generates a merkle proof for the claim's total proof count if it doesn't exceed the maximum number of relays per session, otherwise it falls back to the max relays per session. Additionally, it includes validation of the level count on the claim by the total relays.

Olshansk

Left a couple comments - no blockers.

Appreciate the explanation in the issue. I understand it looks safe but trying to be "ultra careful" with any last minute changes.

Adding @msmania for a review as well.

x/pocketcore/keeper/proof.go

Olshansk · 2023-07-17T23:00:06Z

x/pocketcore/keeper/proof.go

@@ -50,13 +50,16 @@ func (k Keeper) SendProofTx(ctx sdk.Ctx, n client.Client, node *pc.PocketNode, p
 				ctx.Logger().Error(fmt.Sprintf("could not delete evidence is not sealed, could cause a relay leak: %s", err.Error()))
 			}
 		}
-		if evidence.NumOfProofs != claim.TotalProofs {
+


@PoktBlade Have you looked into the difficulty of adding a unit test for this? I realize it's not easy, but just wanted to make sure that we at least put effort into it.

I can look into it some more for sanity purposes.

In regards to some other testing, @jorgecuesta did run it live in his fleet, and saw that there weren't any invalid proofs being submitted on chain. This was a dramatic decrease (0 invalid proofs) vs what he saw before the changes.

Update: I wrote some integration tests last night based off TestClaimProtoTx, but it turns out the test for this was already broken.. The reason why we don't see them when running tests, these tests are skipped whenever -s flag is passed.

Anyhow.. will see if I can fix the initial broken tests (seems like a crashing error when submitting proofs, due to AIOB)

@PoktBlade Bump on existing tests and/or new tests.

Olshansk · 2023-07-17T23:00:22Z

x/pocketcore/keeper/proof.go

@@ -73,8 +76,24 @@ func (k Keeper) SendProofTx(ctx sdk.Ctx, n client.Client, node *pc.PocketNode, p
 		if !found {
 			ctx.Logger().Error(fmt.Sprintf("an error occurred creating the proof transaction with app %s not found with evidence %v", evidence.ApplicationPubKey, evidence))
 		}
+
+		// There is a potential race condition where the evidence is not sealed while submitting a claim


Do you have any idea why this wasn't a major concern in the past?

Generally whenever the claim is sealed, relays stop coming in and so there is some non deterministic timing in which the race condition could happen. If relays keep coming in (i.e session rollover), the problem becomes more prominent.

We did find some on chain tx's to show that this did periodically happen even without session rollover though.

You can go here and filter the transactions by "Proof" and "Failed", and put the start date before ~ the July 10th and it will show some of the failed submissions even with session rollover disabled.

Was running into some issues but reached out to poktscan here: https://discord.com/channels/854406364931686400/950799595171086377/1131367539667107872

msmania

The change looks good. Can you update the comments for better readability?

x/pocketcore/keeper/proof.go

Olshansk · 2023-07-19T23:25:42Z

x/pocketcore/keeper/proof.go

@@ -50,13 +50,16 @@ func (k Keeper) SendProofTx(ctx sdk.Ctx, n client.Client, node *pc.PocketNode, p
 				ctx.Logger().Error(fmt.Sprintf("could not delete evidence is not sealed, could cause a relay leak: %s", err.Error()))
 			}
 		}
-		if evidence.NumOfProofs != claim.TotalProofs {
+


@PoktBlade Bump on existing tests and/or new tests.

Olshansk · 2023-07-19T23:36:46Z

x/pocketcore/keeper/proof.go

@@ -73,8 +76,24 @@ func (k Keeper) SendProofTx(ctx sdk.Ctx, n client.Client, node *pc.PocketNode, p
 		if !found {
 			ctx.Logger().Error(fmt.Sprintf("an error occurred creating the proof transaction with app %s not found with evidence %v", evidence.ApplicationPubKey, evidence))
 		}
+
+		// There is a potential race condition where the evidence is not sealed while submitting a claim


Was running into some issues but reached out to poktscan here: https://discord.com/channels/854406364931686400/950799595171086377/1131367539667107872

x/pocketcore/keeper/proof.go

nodiesBlade · 2023-07-20T16:05:14Z

Added all the comments as suggested! Thanks for the review.

In regards to the tests, I haven't made any progress to them - don't think I will have the bandwidth moving forward to diagnose / fix the integration tests. The changes here are minimal and only guard rails for proof tx, not consensus. If needed, we can push an additional hot fix without the need for consensus upgrades.

Olshansk

Going to block this PR until tests are fixed & added per this comment: #1574 (comment)

@PoktBlade Could you do a handoff (whether it's to myself, a core team member or someone from the community) with regard to the status of the tests you investigated? For example:

Which tests exist for it (e.g. names & links)
Which ones are broken (e.g. names & links)
Insight/ideas into why certain tests are broken (e.g. a few bullet points)
Which ones would you add if you had capacity (e.g. a list)

nodiesBlade · 2023-07-20T21:18:34Z

I'm handing it off as is. The PR is already merged into Poktscan's fork.

nodiesBlade · 2023-07-20T21:22:20Z

Since I don't have time to dedicate more to this issue, please cherry pick or fork my changes and move forward.

Closing since it won't be merged in / blocked.

jorgecuesta · 2023-07-20T21:50:51Z

@Olshansk all the request you ask about test are not fair to be handle by the community people trying to help if pocket has people that get a salary for this.
This is why then people hesitate to help on the project. I think u guys may need to figure out a way to work on v0 because has a lot of things that are not working and should be working like TESTs and that is pocket responsibility, not the community indeed.

Olshansk · 2023-07-20T22:07:51Z

@Olshansk all the request you ask about test are not fair to be handle by the community people trying to help if pocket has people that get a salary for this. This is why then people hesitate to help on the project. I think u guys may need to figure out a way to work on v0 because has a lot of things that are not working and should be working like TESTs and that is pocket responsibility, not the community indeed.

Thanks for the feedback.

I will work on adding the tests myself this time and reach out about getting a budget for it in the future.

add better proof validation

bc1a500

reviewpad bot added small Pull request is small waiting-for-review labels Jul 17, 2023

nodiesBlade changed the title ~~Add Proof tx submission enhancement~~ Add Proof tx submission enhancement/bug Jul 17, 2023

nodiesBlade requested a review from Olshansk July 17, 2023 20:39

Olshansk added this to the Network Cost milestone Jul 17, 2023

Olshansk assigned nodiesBlade Jul 17, 2023

Olshansk requested a review from msmania July 17, 2023 22:18

Olshansk requested changes Jul 17, 2023

View reviewed changes

poktblade added 2 commits July 17, 2023 18:36

add comments describing changes

6665996

update comment clarity for profo submission

ec2135a

msmania requested changes Jul 19, 2023

View reviewed changes

x/pocketcore/keeper/proof.go Outdated Show resolved Hide resolved

x/pocketcore/keeper/proof.go Outdated Show resolved Hide resolved

update comments

64750f8

nodiesBlade requested review from msmania and Olshansk July 19, 2023 04:05

Olshansk requested changes Jul 19, 2023

View reviewed changes

update comments

85b1623

nodiesBlade requested a review from Olshansk July 20, 2023 16:05

Olshansk requested changes Jul 20, 2023

View reviewed changes

nodiesBlade closed this Jul 20, 2023

RossiNYC removed the waiting-for-review label Sep 12, 2023

reviewpad bot added the waiting-for-review label Sep 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Proof tx submission enhancement/bug #1574

Add Proof tx submission enhancement/bug #1574

nodiesBlade commented Jul 17, 2023 •

edited

Loading

Olshansk left a comment

Olshansk Jul 17, 2023

nodiesBlade Jul 17, 2023 •

edited

Loading

nodiesBlade Jul 18, 2023

Olshansk Jul 19, 2023

Olshansk Jul 17, 2023

nodiesBlade Jul 17, 2023

nodiesBlade Jul 17, 2023 •

edited

Loading

Olshansk Jul 19, 2023

msmania left a comment

Olshansk Jul 19, 2023

Olshansk Jul 19, 2023

nodiesBlade commented Jul 20, 2023

Olshansk left a comment •

edited

Loading

nodiesBlade commented Jul 20, 2023

nodiesBlade commented Jul 20, 2023

jorgecuesta commented Jul 20, 2023

Olshansk commented Jul 20, 2023

Add Proof tx submission enhancement/bug #1574

Add Proof tx submission enhancement/bug #1574

Conversation

nodiesBlade commented Jul 17, 2023 • edited Loading

Description

Summary generated by Reviewpad on 17 Jul 23 19:44 UTC

Olshansk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nodiesBlade Jul 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nodiesBlade Jul 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

msmania left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nodiesBlade commented Jul 20, 2023

Olshansk left a comment • edited Loading

Choose a reason for hiding this comment

nodiesBlade commented Jul 20, 2023

nodiesBlade commented Jul 20, 2023

jorgecuesta commented Jul 20, 2023

Olshansk commented Jul 20, 2023

nodiesBlade commented Jul 17, 2023 •

edited

Loading

nodiesBlade Jul 17, 2023 •

edited

Loading

nodiesBlade Jul 17, 2023 •

edited

Loading

Olshansk left a comment •

edited

Loading