Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Allow stream custom maxsize per batch #2063

Merged
merged 5 commits into from
Oct 25, 2024

Conversation

simon28082
Copy link
Contributor

Problem

I'm trying to read the data using badger and use grpc for node sync, but I can't change the max cut size of stream because it's restricted hardcode

// maxStreamSize is the maximum allowed size of a stream batch. This is a soft limit
// as a single list that is still over the limit will have to be sent as is since it
// cannot be split further. This limit prevents the framework from creating batches
// so big that sending them causes issues (e.g running into the max size gRPC limit).
var maxStreamSize = uint64(100 << 20) // 100MB

Solution

Add a property to Stream to allow custom slice size

type Stream struct {
	// Prefix to only iterate over certain range of keys. If set to nil (default), Stream would
	// iterate over the entire DB.
	Prefix []byte

	// Number of goroutines to use for iterating over key ranges. Defaults to 8.
	NumGo int

	// Badger would produce log entries in Infof to indicate the progress of Stream. LogPrefix can
	// be used to help differentiate them from other activities. Default is "Badger.Stream".
	LogPrefix string

	// ChooseKey is invoked each time a new key is encountered. Note that this is not called
	// on every version of the value, only the first encountered version (i.e. the highest version
	// of the value a key has). ChooseKey can be left nil to select all keys.
	//
	// Note: Calls to ChooseKey are concurrent.
	ChooseKey func(item *Item) bool

	// MaxSize is the maximum allowed size of a stream batch
	MaxSize uint64
}

@simon28082 simon28082 requested a review from a team as a code owner May 28, 2024 09:54
@CLAassistant
Copy link

CLAassistant commented May 28, 2024

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link

netlify bot commented May 28, 2024

Deploy Preview for badger-docs canceled.

Name Link
🔨 Latest commit 28cbe82
🔍 Latest deploy log https://app.netlify.com/sites/badger-docs/deploys/6719e479d4c87600085b04ba

@mangalaman93
Copy link
Contributor

@simon28082 the linter is not happy. Could you check that please? and also pull in latest changes from the main branch?

@mangalaman93 mangalaman93 self-assigned this Oct 14, 2024
@simon28082
Copy link
Contributor Author

@mangalaman93 will do it

@simon28082
Copy link
Contributor Author

@mangalaman93 I done it, please review

stream.go Outdated Show resolved Hide resolved
mangalaman93
mangalaman93 previously approved these changes Oct 21, 2024
@@ -315,7 +323,7 @@ func (st *Stream) streamKVs(ctx context.Context) error {
// Send the batch immediately if it already exceeds the maximum allowed size.
// If the size of the batch exceeds maxStreamSize, break from the loop to
// avoid creating a batch that is so big that certain limits are reached.
if batch.LenNoPadding() > int(maxStreamSize) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you try upgrading the action version in .github/workflows/ci-golang-lint.yml:29 to

uses: golangci/golangci-lint-action@v6.1.1

@mangalaman93 mangalaman93 merged commit 10a09e6 into dgraph-io:main Oct 25, 2024
9 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants