Mesh rc 0.4.3 #1575

nodiesBlade · 2023-07-26T21:42:41Z

Description

We should invalidate sessions based of some new errors before a relay / after a relay is handled in order to serve the least amount of free relays.

Here are all the invalid session errors we should account for

func (s Session) Validate(node sdk.Address, app appexported.ApplicationI, sessionNodeCount int) sdk.Error {
	// validate chain
	if len(s.SessionHeader.Chain) == 0 {
		return NewEmptyNonNativeChainError(ModuleName)
	}
	// validate sessionBlockHeight
	if s.SessionHeader.SessionBlockHeight < 1 {
		return NewInvalidBlockHeightError(ModuleName)
	}
	// validate the app public key
	if err := PubKeyVerification(s.SessionHeader.ApplicationPubKey); err != nil {
		return err
	}
	// validate app corresponds to appPubKey
	if app.GetPublicKey().RawString() != s.SessionHeader.ApplicationPubKey {
		return NewInvalidAppPubKeyError(ModuleName)
	}
	// validate app chains
	chains := app.GetChains()
	found := false
	for _, c := range chains {
		if c == s.SessionHeader.Chain {
			found = true
			break
		}
	}
	if !found {
		return NewUnsupportedBlockchainAppError(ModuleName)
	}
	// validate sessionNodes
	err := s.SessionNodes.Validate(sessionNodeCount)
	if err != nil {
		return err
	}
	// validate node is of the session
	if !s.SessionNodes.Contains(node) {
		return NewInvalidSessionError(ModuleName)
	}
	return nil
}

This was inspired after seeing portal send us relays with an invalid app and we were serving them for free (but not retrying to send to full node)

Fix a high memory consumption that also is part of the issue pokt-network#1457. Under high load of requests (1000/rps or more) the RAM got crazy and scale up to 40GB or close to that. Now after the fix of pokt-network#1457 with the worker pool, the node remains under 14gb of ram in my local tests.

… keep cpu in high load.

* Fixed RPC timeout handled as Seconds instead of Milliseconds * Updated mesh.md to handle new cache configurations * Updated mesh.md to list /v1/private/mesh/session as required on the whitelist endpoints/paths

* Fixed /v1/private/mesh/updatechains to properly update them on memory and disk * Added hot reload for servicer private key files (add & remove) * on add turn on the checks and start allowing it * on remove stop receiving and consume all the pending relays in queue * Version bump

* Enhanced log about missing sessions * Version Bump

…rivate key is removed after it been supported by the mesh node. * Version Bump

…ral solution) * Fixed error that panic process when load servicer_url without http/https schema. Now it will properly report the error. * Added manual cron to compact relays database every hour. * Removed a log2.Fatal that was crashing the process.

* relay_cache_background_sync_interval was not used * relay_cache_background_compaction_interval was not used Added: * hot_reload_interval allow to turn off using 0 the hot reload of chains/servicers - otherwise the amount of MS it will check the files again Updated: * Now health check of servicers is done every 60s - was 30s - future: will be configurable through config.json * Now old sessions are evaluated to be removed every 30m - was 30s - future: will be configurable through config.json * config.json example of docs. Removed: * Manual relays db compaction job removed; We receive reports that it was corrupting relays database if you run at same time of background configured by relay_cache_background_compaction_interval

… from storage in any case after they are success/failed. Fixed log that was printing node instead of app public key.

…very servicer on same session.

…unning mixed servicers.

…0.9.2. Bump Mesh client to RC-0.2.6

Added different key format. Refactor connectivity checks. Refactor node/servicer internal structure of mesh to reduce amount of worker/cron instances. Refactor chains/keys reload.

Added FullNode worker dynamic resize on servicers change. Updated servicers reload to only run the modification on maps when there is something new/removed.

…e and better readability of the code without so many casts. Refactor fullNode.Servicer to be a map instead of a slice. Enhance a bit more the logs and bootstrap time information.

Added metrics config support. Refactor code to split in files. Bump pond version to 1.8.3 (patch). Clean up the code.

Update config to handle rpc timeout for different things like chains, client and pocket node calls with a different value.

…able by config file.

Ensure that http response body is read even on errored request to reuse connections.

Enhanced chains reload logs. Enhanced startup logs.

… so many edge cases and possible infinite goroutine spams. Added name property to nodes as optional key, if not set use the hostname of the node url. Added minWorker, maxWorker, maxCapacity to prometheus metrics collectors. Refactor minWorker, maxWorker and maxCapacity option in config. Bump default to a more real world value. Updated docs.

Update docs.

…k session height 100589. Removed jump lines (\n) on the errors provided by the pocketcore code. This difficult the usage of tooling like Loki that will collect a line of text before the jump line as an entry.

… will be done by GetSession a few lines below.

…d the request information like the headers.

…emory

…orage. Fixed a typo. Bump version to RC-0.4.2

jorgecuesta added 30 commits July 10, 2023 15:36

Added /v1/health endpoint and update rpc spec.

53fa629

remove idle timeout in favor to use default (5 seg) because otherwise…

d064ab8

… keep cpu in high load.

Removed hardcoded values in favor of use GlobalPocketConfig

f527d0b

Initial Geo-Mesh work

529006b

* Update mesh start time to start at same nodes will.

de394fa

* Added background cache compaction

84a9d95

* Fixed RPC timeout handled as Seconds instead of Milliseconds * Updated mesh.md to handle new cache configurations * Updated mesh.md to list /v1/private/mesh/session as required on the whitelist endpoints/paths

* Removed unnecessary initCron call

47ce201

* Enhanced log about missing sessions * Version Bump

* Fixed issue that allow session to remains on cache forever if the p…

ef1bd63

…rivate key is removed after it been supported by the mesh node. * Version Bump

Refactor handle relay to ensure relays are stored in time and deleted…

c73b39a

… from storage in any case after they are success/failed. Fixed log that was printing node instead of app public key.

Fixed issue preventing nodes to track remaining relays properly for e…

a3b9e87

…very servicer on same session.

Update cleanup old session cron time.

d40e784

Remove random selection of servicer because produce error for those r…

6768b9e

…unning mixed servicers.

Update go.mod. Update Servicer version to latest official release RC-…

cb369d6

…0.9.2. Bump Mesh client to RC-0.2.6

Update mesh.md

b763524

Initial rework to speed up bootstrap times of mesh client.

def956f

Added different key format. Refactor connectivity checks. Refactor node/servicer internal structure of mesh to reduce amount of worker/cron instances. Refactor chains/keys reload.

Fixed minor issues.

5198a6b

Added FullNode worker dynamic resize on servicers change. Updated servicers reload to only run the modification on maps when there is something new/removed.

Replace sync.Map with xsync.Map to use Generic in favor of performanc…

bf0a7d4

…e and better readability of the code without so many casts. Refactor fullNode.Servicer to be a map instead of a slice. Enhance a bit more the logs and bootstrap time information.

Added worker metrics.

367c825

Added metrics config support. Refactor code to split in files. Bump pond version to 1.8.3 (patch). Clean up the code.

Update docs.

e232a7e

Update config to handle rpc timeout for different things like chains, client and pocket node calls with a different value.

Fixed metrics and cyclic dependencies.

b02be1b

Fixed a bug that disallow the ability to reuse http 1.x connections.

5e1d03a

Added support to Chains & Servicer http client options to be configur…

2a7d830

…able by config file.

Updated rpc-spec.yaml.

12f0bda

Ensure that http response body is read even on errored request to reuse connections.

Fixed issue with /v1/private/mesh/updatechains endpoint.

f42efd9

Enhanced chains reload logs. Enhanced startup logs.

Added specialized metrics for mesh instead of reuse the pocket client.

5cbd609

jorgecuesta and others added 27 commits July 11, 2023 19:11

remove unnecessary log. fix session_test.go

ae5c397

add lookback session rollover support

d8fd93e

cleanup relay.go and fix potential dereference runtime error

7998371

refractor session cleanup math

3f29f3b

add node session comments

7b8f520

add comments about node session

5817ecc

add comments about node session

9d10900

fix relay vaidation and small cleanup on if else

f4f9cfe

Bump Version to RC-0.4.0

50add7c

Update docs.

Added log_relay_request to help PNI debug ongoing issue with the stuc…

9dffac0

…k session height 100589. Removed jump lines (\n) on the errors provided by the pocketcore code. This difficult the usage of tooling like Loki that will collect a line of text before the jump line as an entry.

Avoid evaluation of the tolerance for the future session because that…

da0a8fc

… will be done by GetSession a few lines below.

Moved where the non processed relay error is printed to be able to ad…

ad57a45

…d the request information like the headers.

Fixed relay request headers log.

c4238ce

add better proof validation

2d8b982

add duplicate proof detection

119f809

Change Version to BETA-0.4.2 until we decide it is an RC

fd605c5

remove unnecessary comment

160bd42

add relay to bloom filter immediately validating the relay

f7f7659

seperate to its own functions

50830be

add optimistic duplicate relay map

503e4df

remove unnecessary return of ns

442240b

add a more compute/memory efficient key

3fd806c

add byte<> string conversion and replace storing relay with a empty m…

ce87502

…emory

use the relay proof instead of relay obj

f7bd421

init map

1bb9452

Fixed memory issue due to old sessions not been deleted from local st…

9d216ed

…orage. Fixed a typo. Bump version to RC-0.4.2

account for session validate errors

963f5df

reviewpad bot added large Pull request is large waiting-for-review labels Jul 26, 2023

nodiesBlade closed this Jul 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mesh rc 0.4.3 #1575

Mesh rc 0.4.3 #1575

nodiesBlade commented Jul 26, 2023 •

edited

Loading

Mesh rc 0.4.3 #1575

Mesh rc 0.4.3 #1575

Conversation

nodiesBlade commented Jul 26, 2023 • edited Loading

Description

nodiesBlade commented Jul 26, 2023 •

edited

Loading