Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Introduce msg_limit for application subnets in payload builder #1798

Draft
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

stiegerc
Copy link
Contributor

@stiegerc stiegerc commented Oct 2, 2024

The begin of a stream slice we induct is used for garbage collecting signals. Therefore if we want at most 50k signals in the reverse stream, we have to make sure the stream index of the last message in the slice is at most slice_begin + 50_000.

Since reverse_stream.signals_end >= slice_begin we may have signals remaining after gc'ing, so the number of messages we can include in a slice is given by slice_end - reverse_stream.signals_end or slice_begin + 50_000 - reverse_stream.signals_end.

This PR adds a new function get_signal_end() that computes an upper limit for how many messages can be included into a stream slice using the logic explained above. Note that this can not lead to a lock scenario, because a case where we can't include anymore messages can always be resolved by moving the begin in the incoming slice, i.e. by garbage collecting on the sending subnet.

@stiegerc stiegerc changed the title msg_limit for application subnets in payload builder feat: Introduce msg_limit for application subnets in payload builder Oct 2, 2024
@github-actions github-actions bot added the feat label Oct 2, 2024
@alin-at-dfinity
Copy link
Contributor

alin-at-dfinity commented Oct 5, 2024

After inducting a stream slice, the number of signals in the outgoing stream is equal to number_of_signals_after_gcing + number_of_messages_inducted. So if we want to cap the number of signals in a stream at 50k, we can induct no more than 50k - number_of_signals_after_gcing number of messages.

Gc'ing signals is done using begin in the stream slice, so number_of_signals_after_gcing = outgoing_stream_signals_end - incoming_slice_begin; the signals_end of the outgoing stream is given by message_index of ExpectedIndices.

I find this very hard to parse. A simpler way of putting it is that if we want to limit the outgoing signals to 50k, then we should not induct messages beyond index incoming_stream_begin + 50k (do note that that's "incoming stream begin", which is different from "incoming slice begin"). How many (or rather which) messages we can induct, directly follows from the above (from wherever we are now to incoming_stream_begin + 50k).

@@ -879,11 +883,24 @@ impl XNetPayloadBuilderImpl {
///
/// In order to prevent mutual stalling, only applies to incoming NNS
/// streams; and to `Application`-subnet-to-`System`-subnet streams.
pub fn get_msg_limit(subnet_id: SubnetId, state: &ReplicatedState) -> Option<usize> {
pub fn get_msg_limit(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment needs updating.

On a second thought, I would rather have a separate function (or just inline code) to compute an upper bound for slice.end (which is just stream.begin + 50k, nothing fancy). As is, I'm not 100% sure about this, but we may not even enforce the 50k limit for application subnet streams on system subnets.

It would be a lot easier if we kept this as is; and then, where we call it, simply took the minimum of whatever this returns and stream_begin + 50k - expected.message_index. Then you're obviously under both limits at all times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it its own function get_signal_limit() that follows the logic you suggest here.

slice_begin: StreamIndex,
) -> Option<usize> {
let get_limit = || -> Option<usize> {
const MAX_SIGNALS_STREAM_COUNT: usize = 50_000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be more than a constant hidden down in some function. If nothing else, it should be a documented constant somewhere at the top of the crate / module.

Also MAX_SIGNALS_STREAM_COUNT is hard to read. MAX_SIGNAL_COUNT? MAX_STREAM_SIGNALS? MAX_STREAM_SIGNAL_COUNT? (Although the latter is kind of redundant).

Copy link
Contributor Author

@stiegerc stiegerc Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be the same as MAX_STREAM_MESSAGES I think. This limit should be respected intrinsically anyway except under circumstances where reject responses are pushed into streams indiscriminately, i.e. reject signals for requests from a migrated canister.

rs/xnet/payload_builder/src/lib.rs Outdated Show resolved Hide resolved
subnet_id: SubnetId,
state: &ReplicatedState,
expected: Option<&ExpectedIndices>,
slice_begin: StreamIndex,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This is stream_begin, not slice_begin.

Copy link
Contributor Author

@stiegerc stiegerc Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment I don't understand. I called it slice_begin because that is what is used for garbage collecting signals. I mean this part of the code is removed now, but I kept the name elsewhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you observed on Slack, stream_slice.header().begin() is the beginning of the stream, not of the slice. The header in question is the stream header. A slice consists of the stream header and some range of messages. The stream header would be the same regardless of which messages are included in the slice (or if no messages are included).

I.e. this is the stream begin, as certified by the remote subnet. Not the slice begin.

rs/xnet/payload_builder/src/certified_slice_pool.rs Outdated Show resolved Hide resolved
@stiegerc
Copy link
Contributor Author

stiegerc commented Oct 15, 2024

AFAICT msg_limit in take_prefix() is now always Some(_) so the Option<_> can be removed, but I left it in since I am not sure whether we may want to keep it as it is and also to keep the diff at a minimum.

// TODO: Replace the 50_000 with a constant. This should be the same constant used in the
// stream builder: `MAX_STREAM_MESSAGES`.
begin.map_or(50_000, |begin| {
let slice_end_index = slice_begin + 50_000.into();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'd call this

Suggested change
let slice_end_index = slice_begin + 50_000.into();
let max_slice_end_index = slice_begin + 50_000.into();

so that it is clear that this is the last index we would possibly want to accept.

pub fn get_signal_limit(begin: Option<&ExpectedIndices>, slice_begin: StreamIndex) -> usize {
// TODO: Replace the 50_000 with a constant. This should be the same constant used in the
// stream builder: `MAX_STREAM_MESSAGES`.
begin.map_or(50_000, |begin| {
Copy link
Contributor

@derlerd-dfinity derlerd-dfinity Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a hard time following why begin is optional here? Shouldn't we always know the begin of the slice we are about to put in a block?

Also looking at the code above it seems that we are passing Some(...) at all call sites.

Copy link
Contributor

@derlerd-dfinity derlerd-dfinity Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that what you say here is similar to this observation. I guess we could change take_prefix to remove the Option as it feels more natural. It is quite hard to reason about the edge case if begin is non here.


// Ensure the signal limit is respected.
let signal_limit = get_signal_limit(Some(expected), slice.header().begin());
if messages.len() > signal_limit {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic why this can be done separately from the check above about msg_limit is that we will include the min of signal_limit and msg_limit so for an honestly built block we know that both must pass right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. These are just two independent upper bounds that must be respected by a valid slice.

Comment on lines +1164 to +1168
slice
.payload
.messages
.as_ref()
.map(|messages| messages.begin()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be begin.message_index. Else, you need to move this new block of code after the if let Some(begin) = begin below (where we advance the slice to begin.message_index; or bail out if we're already beyond it.

/// XNet traffic between honest subnets implies `slice_begin >= stream_begin` since the remote
/// subnet can only gc messages once this subnet reports corresponding signals (generated by
/// inducting messages from a stream slice).
pub fn get_signal_limit(stream_begin: StreamIndex, slice_begin: Option<StreamIndex>) -> usize {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, "signal limit" is a confusing name. As you say in the doc comment, this is an upper bound on the number messages that can be inducted in order to limit the number of signals in our stream.

I believe it would be a lot clearer if you replaced this with a much simpler and more obvious "max message index", which is stream_begin + MAX_STREAM_MESSAGES. That's all this function needs to do.

Then, the two callers either subtract begin.message_index from it to obtain a second, "message limit that enforces signal constraints" bound; or simply compare slice_end with it to check that the slice is valid. No need to check anything (e.g. the if slice_begin < stream_begin test below), less need for in-depth explanations.


// Ensure the signal limit is respected.
let signal_limit = get_signal_limit(Some(expected), slice.header().begin());
if messages.len() > signal_limit {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. These are just two independent upper bounds that must be respected by a valid slice.

Comment on lines +141 to +143
if let Some(slice_end) = testing::slice_end(prefix) {
assert!(slice_end <= testing::stream_begin(prefix) + StreamIndex::new(MAX_STREAM_MESSAGES as u64));
}
Copy link
Contributor Author

@stiegerc stiegerc Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more for the sake of completion since it is never called with numbers anywhere near large enough to meaningfully check this. Since it is a proptest, I'd rather not extend this to include 10k messages in every single iteration. A small unit test to check for the signals limit should be enough at least unless we make the message limit in streams dynamically configurable (part of a config or something, not a constant). But that's not for this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants