-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Introduce msg_limit for application subnets in payload builder #1798
base: master
Are you sure you want to change the base?
Conversation
I find this very hard to parse. A simpler way of putting it is that if we want to limit the outgoing signals to 50k, then we should not induct messages beyond index |
rs/xnet/payload_builder/src/lib.rs
Outdated
@@ -879,11 +883,24 @@ impl XNetPayloadBuilderImpl { | |||
/// | |||
/// In order to prevent mutual stalling, only applies to incoming NNS | |||
/// streams; and to `Application`-subnet-to-`System`-subnet streams. | |||
pub fn get_msg_limit(subnet_id: SubnetId, state: &ReplicatedState) -> Option<usize> { | |||
pub fn get_msg_limit( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc comment needs updating.
On a second thought, I would rather have a separate function (or just inline code) to compute an upper bound for slice.end
(which is just stream.begin + 50k
, nothing fancy). As is, I'm not 100% sure about this, but we may not even enforce the 50k limit for application subnet streams on system subnets.
It would be a lot easier if we kept this as is; and then, where we call it, simply took the minimum of whatever this returns and stream_begin + 50k - expected.message_index
. Then you're obviously under both limits at all times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made it its own function get_signal_limit()
that follows the logic you suggest here.
rs/xnet/payload_builder/src/lib.rs
Outdated
slice_begin: StreamIndex, | ||
) -> Option<usize> { | ||
let get_limit = || -> Option<usize> { | ||
const MAX_SIGNALS_STREAM_COUNT: usize = 50_000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be more than a constant hidden down in some function. If nothing else, it should be a documented constant somewhere at the top of the crate / module.
Also MAX_SIGNALS_STREAM_COUNT
is hard to read. MAX_SIGNAL_COUNT
? MAX_STREAM_SIGNALS
? MAX_STREAM_SIGNAL_COUNT
? (Although the latter is kind of redundant).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be the same as MAX_STREAM_MESSAGES I think. This limit should be respected intrinsically anyway except under circumstances where reject responses are pushed into streams indiscriminately, i.e. reject signals for requests from a migrated canister.
rs/xnet/payload_builder/src/lib.rs
Outdated
subnet_id: SubnetId, | ||
state: &ReplicatedState, | ||
expected: Option<&ExpectedIndices>, | ||
slice_begin: StreamIndex, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This is stream_begin
, not slice_begin
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment I don't understand. I called it slice_begin
because that is what is used for garbage collecting signals. I mean this part of the code is removed now, but I kept the name elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you observed on Slack, stream_slice.header().begin()
is the beginning of the stream, not of the slice. The header in question is the stream header. A slice consists of the stream header and some range of messages. The stream header would be the same regardless of which messages are included in the slice (or if no messages are included).
I.e. this is the stream begin, as certified by the remote subnet. Not the slice begin.
AFAICT |
rs/xnet/payload_builder/src/lib.rs
Outdated
// TODO: Replace the 50_000 with a constant. This should be the same constant used in the | ||
// stream builder: `MAX_STREAM_MESSAGES`. | ||
begin.map_or(50_000, |begin| { | ||
let slice_end_index = slice_begin + 50_000.into(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I'd call this
let slice_end_index = slice_begin + 50_000.into(); | |
let max_slice_end_index = slice_begin + 50_000.into(); |
so that it is clear that this is the last index we would possibly want to accept.
rs/xnet/payload_builder/src/lib.rs
Outdated
pub fn get_signal_limit(begin: Option<&ExpectedIndices>, slice_begin: StreamIndex) -> usize { | ||
// TODO: Replace the 50_000 with a constant. This should be the same constant used in the | ||
// stream builder: `MAX_STREAM_MESSAGES`. | ||
begin.map_or(50_000, |begin| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a hard time following why begin
is optional here? Shouldn't we always know the begin of the slice we are about to put in a block?
Also looking at the code above it seems that we are passing Some(...)
at all call sites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that what you say here is similar to this observation. I guess we could change take_prefix
to remove the Option
as it feels more natural. It is quite hard to reason about the edge case if begin
is non here.
|
||
// Ensure the signal limit is respected. | ||
let signal_limit = get_signal_limit(Some(expected), slice.header().begin()); | ||
if messages.len() > signal_limit { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic why this can be done separately from the check above about msg_limit
is that we will include the min
of signal_limit
and msg_limit
so for an honestly built block we know that both must pass right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. These are just two independent upper bounds that must be respected by a valid slice.
slice | ||
.payload | ||
.messages | ||
.as_ref() | ||
.map(|messages| messages.begin()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be begin.message_index
. Else, you need to move this new block of code after the if let Some(begin) = begin
below (where we advance the slice to begin.message_index
; or bail out if we're already beyond it.
/// XNet traffic between honest subnets implies `slice_begin >= stream_begin` since the remote | ||
/// subnet can only gc messages once this subnet reports corresponding signals (generated by | ||
/// inducting messages from a stream slice). | ||
pub fn get_signal_limit(stream_begin: StreamIndex, slice_begin: Option<StreamIndex>) -> usize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, "signal limit" is a confusing name. As you say in the doc comment, this is an upper bound on the number messages that can be inducted in order to limit the number of signals in our stream.
I believe it would be a lot clearer if you replaced this with a much simpler and more obvious "max message index", which is stream_begin + MAX_STREAM_MESSAGES
. That's all this function needs to do.
Then, the two callers either subtract begin.message_index
from it to obtain a second, "message limit that enforces signal constraints" bound; or simply compare slice_end
with it to check that the slice is valid. No need to check anything (e.g. the if slice_begin < stream_begin
test below), less need for in-depth explanations.
|
||
// Ensure the signal limit is respected. | ||
let signal_limit = get_signal_limit(Some(expected), slice.header().begin()); | ||
if messages.len() > signal_limit { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. These are just two independent upper bounds that must be respected by a valid slice.
if let Some(slice_end) = testing::slice_end(prefix) { | ||
assert!(slice_end <= testing::stream_begin(prefix) + StreamIndex::new(MAX_STREAM_MESSAGES as u64)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more for the sake of completion since it is never called with numbers anywhere near large enough to meaningfully check this. Since it is a proptest, I'd rather not extend this to include 10k messages in every single iteration. A small unit test to check for the signals limit should be enough at least unless we make the message limit in streams dynamically configurable (part of a config or something, not a constant). But that's not for this change.
The
begin
of a stream slice we induct is used for garbage collecting signals. Therefore if we want at most 50k signals in the reverse stream, we have to make sure the stream index of the last message in the slice is at mostslice_begin + 50_000
.Since
reverse_stream.signals_end >= slice_begin
we may have signals remaining after gc'ing, so the number of messages we can include in a slice is given byslice_end - reverse_stream.signals_end
orslice_begin + 50_000 - reverse_stream.signals_end
.This PR adds a new function
get_signal_end()
that computes an upper limit for how many messages can be included into a stream slice using the logic explained above. Note that this can not lead to a lock scenario, because a case where we can't include anymore messages can always be resolved by moving thebegin
in the incomingslice
, i.e. by garbage collecting on the sending subnet.