Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add configuration settings for PubSub source #374

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

adatzer
Copy link
Contributor

@adatzer adatzer commented Oct 8, 2024

Jira ref: PDP-1480

max_outstanding_messages = 2000

# Maximum size of unprocessed messages (default 1e9)
max_outstanding_bytes = 2e9

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with the go sdk for pubsub, but I am very familiar with the java sdk for pubsub.

In the java sdk, these settings apply per streaming pull. E.g. if the app opens 50 streaming pulls, then your default setting will allow 50 * 2e9 outstanding bytes in total across all streaming pulls, i.e. 100 GB.

In Snowbridge the number of streaming pulls is set on this line (NumGoRoutines). Snowbridge sets it equal to the value of concurrent writes.

Copy link
Contributor Author

@adatzer adatzer Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment, @istreeter, it made me look a bit deeper.

To start with what this PR does, before exposing these settings, we were implicitly using these defaults from the client library (1000 and 1e9) already. In this PR these are not changed, only made explicit, so that they can be configured.

This particular file shows 2000 and 2e9 as example values, the defaults have been and still are half of that. Still, as example values are valid (based on these recommendations).

Indeed NumGoRoutines is the number of streaming pull connections maintained by the client.
However as far as i can tell so far, the MaxOutstanding* settings are meant to throttle the throughput no matter how many goroutines for pulling messages there are.

So my understanding is that these flow control settings do limit the outstanding messages/bytes per subscriber client. For example, if max_outstanding_bytes is set to 2e9 then, as long as the setting is respected(caveats seem to exist), it will allow up to 2GB of outstanding size across all streaming pulls.

Here is how the client flow control acquires a message. This happens for each message, that was received for each goroutine/streaming pull.

Having said that, i'd be surprised if the java and go clients are different by such a margin in memory settings, which makes me doubt my understanding. Please do share any more pointers!

Copy link

@istreeter istreeter Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the java sdk, it also does exactly what you describe if "legacy flow control" is enabled, i.e. it uses a local FlowController which is shared across all streaming pulls.

But if it's using non-legacy flow control then it's slightly different. In the java sdk if you pick non-legacy flow control, then instead of using the local FlowController, it instead sets the options .setMaxOutstandingBytes on the streaming pull request. (Edit: Forgot to add link)

I notice the Go sdk also has an option UseLegacyFlowControl here on the ReceiveSettings.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate your comment about the purpose of this PR, and how this just makes explicit a setting that was previously non-configurable. That all makes sense 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants