-
Notifications
You must be signed in to change notification settings - Fork 605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a physical plan for block querying #2586
Create a physical plan for block querying #2586
Conversation
4300368
to
9338f89
Compare
8df68f6
to
c7c1e11
Compare
// adapt the plan to make sure all replicas will deduplicate | ||
if deduplicate { | ||
for _, hints := range plan { | ||
hints.Deduplication = deduplicate | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean that if we got a query like "last 2 days" we will force deduplication as the plan also includes blocks that have not been compacted yet? I guess, this should be mitigated by query time range split – I'm wondering if we should set to the block duration by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is true. The first step is for the query time range split to alight with roughly the processing time for the compacted block. So it should be slightly higher than block duration, just because we introduce a fair bit of latency before a block is compacted/deduplicated and available to store-gateways.
Ideally in a future where the query-frontend will also be aware of the plan, we can split accordingly to what the block layout is.
This PR will reduce the work necessary to query blocks, by planning out which blocks are fetched from which ingester/store-gateway. That way queries will potentially not need profile-by-profile deduplication for blocks that had been deduplicated by the compactor.
c7c1e11
to
cadba7a
Compare
Query head first (heads and flushing) and then queriers, this will make sure that we do not miss a block moving into flushing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀 Looking forward to seeing this in action!
} | ||
|
||
// TODO(simonswine): Split profiles per row group and run the MergeByStacktraces in parallel. | ||
merge, err := querier.MergeByStacktraces(ctx, iter.NewSliceIterator(querier.Sort(profiles))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can avoid sorting and pulling actual labels. This could be big savings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you mean all in one step: a SelectAndMergeProfiles
runs one iterator through data?
* Create a physical plan for block querying This PR will reduce the work necessary to query blocks, by planning out which blocks are fetched from which ingester/store-gateway. That way queries will potentially not need profile-by-profile deduplication for blocks that had been deduplicated by the compactor. * Remove unused field * Missing `make generate` change * Signal that there are no profiles streamed * Generate block metadata correctly * Improve tracing for block plan * Request BlockMetadata correctly Query head first (heads and flushing) and then queriers, this will make sure that we do not miss a block moving into flushing. * Forward hints correctly to store-gateway * Sort iterator properly * Ensure block metadata repsones are ts bound for ingesters * Ignore storeQueryAfter for plan building * Move the signalling to close connction to the right place * Ensure we only read replica for the correct instance type * Handle no result profiles * Log full plan in debug level
This PR will reduce the work necessary to query blocks, by planning out
which blocks are fetched from which ingester/store-gateway.
That way queries will potentially not need profile-by-profile
deduplication for blocks that had been deduplicated by the compactor.
TODO:
ingester/store-gateway
Reuse the compactor block mark deletion algorithm in part to disqualify blocks(It is a bit more nuanced, so keeping it separate for now)