From e81dccf94ed4da87ce2388912c7a45efd3aa1312 Mon Sep 17 00:00:00 2001 From: Yee Hing Tong Date: Mon, 25 Mar 2024 13:32:18 -0700 Subject: [PATCH 1/3] rfc Signed-off-by: Yee Hing Tong --- rfc/system/5101-offloaded-literal.md | 89 ++++++++++++++++++++++++++++ 1 file changed, 89 insertions(+) create mode 100644 rfc/system/5101-offloaded-literal.md diff --git a/rfc/system/5101-offloaded-literal.md b/rfc/system/5101-offloaded-literal.md new file mode 100644 index 0000000000..ad521f73b0 --- /dev/null +++ b/rfc/system/5101-offloaded-literal.md @@ -0,0 +1,89 @@ +# [RFC Template] Title + +**Authors:** + +- @nickname +- @nickname + +## 1 Executive Summary + +*A short paragraph or bullet list that quickly explains what you're trying to do.* + +## 2 Motivation + +*What motivates this proposal, and why is it important?* + +*Here, we aim to get comfortable articulating the value of our actions.* + +## 3 Proposed Implementation + +*This is the core of your proposal, and its purpose is to help you think through the problem because [writing is thinking](https://medium.learningbyshipping.com/writing-is-thinking-an-annotated-twitter-thread-2a75fe07fade).* + +*Consider:* + +- *using diagrams to help illustrate your ideas.* +- *including code examples if you're proposing an interface or system contract.* +- *linking to project briefs or wireframes that are relevant.* + +## 4 Metrics & Dashboards + +*What are the main metrics we should be measuring? For example, when interacting with an external system, it might be the external system latency. When adding a new table, how fast would it fill up?* + +## 5 Drawbacks + +*Are there any reasons why we should not do this? Here we aim to evaluate risk and check ourselves.* + +## 6 Alternatives + +*What are other ways of achieving the same outcome?* + +## 7 Potential Impact and Dependencies + +*Here, we aim to be mindful of our environment and generate empathy towards others who may be impacted by our decisions.* + +- *What other systems or teams are affected by this proposal?* +- *How could this be exploited by malicious attackers?* + +## 8 Unresolved questions + +*What parts of the proposal are still being defined or not covered by this proposal?* + +## 9 Conclusion + +*Here, we briefly outline why this is the right decision to make at this time and move forward!* + +## 10 RFC Process Guide, remove this section when done + +*By writing an RFC, you're giving insight to your team on the direction you're taking. There may not be a right or better decision in many cases, but we will likely learn from it. By authoring, you're making a decision on where you want us to go and are looking for feedback on this direction from your team members, but ultimately the decision is yours.* + +This document is a: + +- thinking exercise, prototype with words. +- historical record, its value may decrease over time. +- way to broadcast information. +- mechanism to build trust. +- tool to empower. +- communication channel. + +This document is not: + +- a request for permission. +- the most up to date representation of any process or system + +**Checklist:** + +- [ ] Copy template +- [ ] Draft RFC (think of it as a wireframe) +- [ ] Share as WIP with folks you trust to gut-check +- [ ] Send pull request when comfortable +- [ ] Label accordingly +- [ ] Assign reviewers +- [ ] Merge PR + +**Recommendations** + +- Tag RFC title with [WIP] if you're still ironing out details. +- Tag RFC title with [Newbie] if you're trying out something experimental or you're not entirely convinced of what you're proposing. +- Tag RFC title with [RR] if you'd like to schedule a review request to discuss the RFC. +- If there are areas that you're not convinced on, tag people who you consider may know about this and ask for their input. +- If you have doubts, ask on [#feature-discussions](https://slack.com/app_redirect?channel=CPQ3ZFQ84&team=TN89P6GGK) for help moving something forward. From 0577fcfcaf68fd2342298edc426b0a5cc27a67bc Mon Sep 17 00:00:00 2001 From: Yee Hing Tong Date: Mon, 25 Mar 2024 14:56:50 -0700 Subject: [PATCH 2/3] first Signed-off-by: Yee Hing Tong --- rfc/system/5101-offloaded-literal.md | 89 ---------------------------- rfc/system/5103-offloaded-literal.md | 69 +++++++++++++++++++++ 2 files changed, 69 insertions(+), 89 deletions(-) delete mode 100644 rfc/system/5101-offloaded-literal.md create mode 100644 rfc/system/5103-offloaded-literal.md diff --git a/rfc/system/5101-offloaded-literal.md b/rfc/system/5101-offloaded-literal.md deleted file mode 100644 index ad521f73b0..0000000000 --- a/rfc/system/5101-offloaded-literal.md +++ /dev/null @@ -1,89 +0,0 @@ -# [RFC Template] Title - -**Authors:** - -- @nickname -- @nickname - -## 1 Executive Summary - -*A short paragraph or bullet list that quickly explains what you're trying to do.* - -## 2 Motivation - -*What motivates this proposal, and why is it important?* - -*Here, we aim to get comfortable articulating the value of our actions.* - -## 3 Proposed Implementation - -*This is the core of your proposal, and its purpose is to help you think through the problem because [writing is thinking](https://medium.learningbyshipping.com/writing-is-thinking-an-annotated-twitter-thread-2a75fe07fade).* - -*Consider:* - -- *using diagrams to help illustrate your ideas.* -- *including code examples if you're proposing an interface or system contract.* -- *linking to project briefs or wireframes that are relevant.* - -## 4 Metrics & Dashboards - -*What are the main metrics we should be measuring? For example, when interacting with an external system, it might be the external system latency. When adding a new table, how fast would it fill up?* - -## 5 Drawbacks - -*Are there any reasons why we should not do this? Here we aim to evaluate risk and check ourselves.* - -## 6 Alternatives - -*What are other ways of achieving the same outcome?* - -## 7 Potential Impact and Dependencies - -*Here, we aim to be mindful of our environment and generate empathy towards others who may be impacted by our decisions.* - -- *What other systems or teams are affected by this proposal?* -- *How could this be exploited by malicious attackers?* - -## 8 Unresolved questions - -*What parts of the proposal are still being defined or not covered by this proposal?* - -## 9 Conclusion - -*Here, we briefly outline why this is the right decision to make at this time and move forward!* - -## 10 RFC Process Guide, remove this section when done - -*By writing an RFC, you're giving insight to your team on the direction you're taking. There may not be a right or better decision in many cases, but we will likely learn from it. By authoring, you're making a decision on where you want us to go and are looking for feedback on this direction from your team members, but ultimately the decision is yours.* - -This document is a: - -- thinking exercise, prototype with words. -- historical record, its value may decrease over time. -- way to broadcast information. -- mechanism to build trust. -- tool to empower. -- communication channel. - -This document is not: - -- a request for permission. -- the most up to date representation of any process or system - -**Checklist:** - -- [ ] Copy template -- [ ] Draft RFC (think of it as a wireframe) -- [ ] Share as WIP with folks you trust to gut-check -- [ ] Send pull request when comfortable -- [ ] Label accordingly -- [ ] Assign reviewers -- [ ] Merge PR - -**Recommendations** - -- Tag RFC title with [WIP] if you're still ironing out details. -- Tag RFC title with [Newbie] if you're trying out something experimental or you're not entirely convinced of what you're proposing. -- Tag RFC title with [RR] if you'd like to schedule a review request to discuss the RFC. -- If there are areas that you're not convinced on, tag people who you consider may know about this and ask for their input. -- If you have doubts, ask on [#feature-discussions](https://slack.com/app_redirect?channel=CPQ3ZFQ84&team=TN89P6GGK) for help moving something forward. diff --git a/rfc/system/5103-offloaded-literal.md b/rfc/system/5103-offloaded-literal.md new file mode 100644 index 0000000000..eab880af63 --- /dev/null +++ b/rfc/system/5103-offloaded-literal.md @@ -0,0 +1,69 @@ +# [RFC] Offloaded Raw Literals + +**Authors:** + +- @wild-endeavor +- @EngHabu + +## 1 Executive Summary + +Flyte depends on a series of `inputs.pb` and `outputs.pb` files to do communication between nodes. This has typically served us well, except for the occasional map task that produces a large Literal output. We sometimes also run into this issue for large dataclasses. This RFC proposes a mechanism that allows the offloading of any Literal, which would be done only of course for now for size reasons. + +## 2 Motivation +A [cursory search](https://discuss.flyte.org/?threads%5Bquery%5D=LIMIT_EXCEEDED) of Slack history shows a few times that this has come up before (and I remember some other instances, I think that search term just wasn't included). This is something that we've historically addressed by just increasing the size of the grpc message that's allowed, but this is an unsustainable solution. + +## 3 Proposed Implementation + +### 3.1 Offloaded Literal IDL +To the `Literal` [message](https://github.com/flyteorg/flyte/blob/cb6384ac6ea60f8b9421a71cfda4279f3579d3cb/flyteidl/protos/flyteidl/core/literals.proto#L95), add a new field called `starp` that will point to a location in the "metadata" bucket of the Flyte backend. The offloaded bytes should be deserialzable into a `Literal` object. + +Questions: How will things like metadata be handled? Should they be merged? What should be in the `value` field of the main parent Literal? + +### 3.2 Flyte Propeller +* When writing map task outputs, depending on the size, Propeller will need to offload the LiteralCollection after constructing it, and create a new Literal for downstream tasks to use, with the +* Also Propeller will need to check the flytekit version of the map task. If it's an older version (i.e. before the change proposed in this RFC), and it's large enough to need to be offloaded, it should fail the task. The assumption here is that if the map task is of the older version then downstream tasks will probably also be of those older versions which won't know how to resolved these offloaded literals. + +### 3.3 Flytekit & Copilot +Flytekit and Copilot will both need to detect that a Literal has been offloaded and know to download it. + +For large outputs (like large maps of large dataclasses), Flytekit should also know how to offload the data. This should be done transparently to the user. How will propeller know to fail though if propeller hasn't been updated? + +### 3.4 Other Implications +Does console need to change at all? + +## 4 Metrics & Dashboards + +*What are the main metrics we should be measuring? For example, when interacting with an external system, it might be the external system latency. When adding a new table, how fast would it fill up?* + +## 5 Drawbacks + +*Are there any reasons why we should not do this? Here we aim to evaluate risk and check ourselves.* + +## 6 Alternatives + +Alternate suggestions that were proposed include + +* For map tasks, change the type of the output to a Union of the current user defined List and a new Offloaded type. We felt this would be a bit awkward since it changes the user-facing type itself (like if you were to pull up the map task definition in the API endpoint). It's also not extensible to other types of literals (maps of large dataclasses for example). + +* Build off of the input wrapper construct that's still in PR. The idea was to have the wrapper contain in large cases, a reference to the data, and in small cases, the data itself. We didn't fully like this idea because the entire input set or output set needs to be offloaded. + * If the task downstream of a map task takes both the output list, along with some other input, after creating and upload the large pb file for the map task's output, Propeller would need to re-upload the entire large list or map (one time for each downstream task). If the offloading is done per literal, Propeller can just upload once and use. + +## 7 Potential Impact and Dependencies + +There's a couple edge cases that will just not work. + +* If the map task is of an older flytekit version but for some reason the downstream task is of a newer version, Propeller will fail unnecessarily. +* If the map task is a newer version, but the downstream task is an older version, the downstream task will fail correctly. + +Are there concerns about the fact that if we're offloading data once, and then sharing the pointer, we're no longer copying-by-value? Does this break any of the guarantees of Flyte and will we need to be more careful in the future around other changes to avoid issues? + +## 8 Unresolved questions + +Should we create a new oneof that's offloaded? + +Is there anything around sampled data, or automatically computed actual metadata (like number of elements in the list) that we should do? + +## 9 Conclusion + +*Here, we briefly outline why this is the right decision to make at this time and move forward!* + From 2abc1275f0930827d6c6c8314e4152061367f861 Mon Sep 17 00:00:00 2001 From: Katrina Rogan Date: Wed, 14 Aug 2024 17:07:54 +0200 Subject: [PATCH 3/3] Review comments Signed-off-by: Katrina Rogan --- rfc/system/5103-offloaded-literal.md | 72 +++++++++++++++++++++++----- 1 file changed, 61 insertions(+), 11 deletions(-) diff --git a/rfc/system/5103-offloaded-literal.md b/rfc/system/5103-offloaded-literal.md index eab880af63..fb835a6aab 100644 --- a/rfc/system/5103-offloaded-literal.md +++ b/rfc/system/5103-offloaded-literal.md @@ -4,32 +4,79 @@ - @wild-endeavor - @EngHabu +- @katrogan ## 1 Executive Summary -Flyte depends on a series of `inputs.pb` and `outputs.pb` files to do communication between nodes. This has typically served us well, except for the occasional map task that produces a large Literal output. We sometimes also run into this issue for large dataclasses. This RFC proposes a mechanism that allows the offloading of any Literal, which would be done only of course for now for size reasons. +Flyte depends on a series of `inputs.pb` and `outputs.pb` files to do communication between nodes. This has typically served us well, except for the occasional map task that produces a large Literal collection output. This large collection typically exceeds default gRPC and configured storage limits. We sometimes also run into this issue for large dataclasses. This RFC proposes a mechanism that allows the offloading of any Literal, to get around size limitations for passing large Literal protobuf messages in the system. ## 2 Motivation -A [cursory search](https://discuss.flyte.org/?threads%5Bquery%5D=LIMIT_EXCEEDED) of Slack history shows a few times that this has come up before (and I remember some other instances, I think that search term just wasn't included). This is something that we've historically addressed by just increasing the size of the grpc message that's allowed, but this is an unsustainable solution. +A [cursory search](https://discuss.flyte.org/?threads%5Bquery%5D=LIMIT_EXCEEDED) of Slack history shows a few times that this has come up before (and I remember some other instances, I think that search term just wasn't included). This is something that we've historically addressed by just increasing the size of the grpc message that's allowed, but this is an unsustainable solution and severely reduces the utility of large-fan-out map tasks. ## 3 Proposed Implementation +We propose configuring propeller to offload large literal collections, using the following config -### 3.1 Offloaded Literal IDL -To the `Literal` [message](https://github.com/flyteorg/flyte/blob/cb6384ac6ea60f8b9421a71cfda4279f3579d3cb/flyteidl/protos/flyteidl/core/literals.proto#L95), add a new field called `starp` that will point to a location in the "metadata" bucket of the Flyte backend. The offloaded bytes should be deserialzable into a `Literal` object. +```yaml +type LiteralOffloadingConfig struct { + Enabled bool + // Maps flytekit SDK to minimum supported version that can handle reading offloaded literals. + SupportedSDKVersions map[string]string + // Default, 10Mbs. Determines the size of a literal at which to trigger offloading + MinSizeInMBForOffloading uint64 + // Fail fast threshold + MaxSizeInMBForOffloading uint64 +} + +``` -Questions: How will things like metadata be handled? Should they be merged? What should be in the `value` field of the main parent Literal? +### 3.1 Offloaded Literal IDL +Update the `Literal` [message](https://github.com/flyteorg/flyte/blob/4a7c3c0040b1995a43939407b99ca3e87b1eb752/flyteidl/protos/flyteidl/core/literals.proto#L94-L114) +like so + +```protobuf +message Literal { + oneof value { + // A simple value. + Scalar scalar = 1; + // A collection of literals to allow nesting. + LiteralCollection collection = 2; + // A map of strings to literals. + LiteralMap map = 3; + } + ... + // ** new below this line ** + // If this literal is offloaded, this field will contain metadata including the offload location. + string uri = 6; + // Includes information about the size of the literal. + uint64 size_bytes = 7; +} + +``` ### 3.2 Flyte Propeller -* When writing map task outputs, depending on the size, Propeller will need to offload the LiteralCollection after constructing it, and create a new Literal for downstream tasks to use, with the -* Also Propeller will need to check the flytekit version of the map task. If it's an older version (i.e. before the change proposed in this RFC), and it's large enough to need to be offloaded, it should fail the task. The assumption here is that if the map task is of the older version then downstream tasks will probably also be of those older versions which won't know how to resolved these offloaded literals. +Once offloading is enabled in the deployment config, flytepropeller can read from the [RuntimeMetadata](https://github.com/flyteorg/flyte/blob/f448a0358d8706a09b65b96543134f629327d755/flyteidl/protos/flyteidl/core/tasks.proto#L71-L87) in the task config to determine the SDK version. + +When writing outputs in the [remote_file_output_writer](https://github.com/flyteorg/flyte/blob/2ca31119d6b9258661a71f38e450f93b6692402c/flyteplugins/go/tasks/pluginmachinery/ioutils/remote_file_output_writer.go#L56-L84) the source code should detect whether the literal size exceeds the configured minimum and +- if the task is using a newer SDK version that supports reading offloaded literals, offload the literal to the configured storage backend and update the literal with the offload URI and size. +- if the task is using an older SDK version that doesn't support offloaded literals, fail the task with an error message indicating that the task output is too large and the user should update their SDK version. Downstream tasks will need to understand how to consume the offloaded literal and will need to be on newer version of the SDK as well. ### 3.3 Flytekit & Copilot Flytekit and Copilot will both need to detect that a Literal has been offloaded and know to download it. +- in Flytekit this can be done by checking the `uri` field in the Literal message when converting a literal [to_python_value](https://github.com/flyteorg/flytekit/blob/e394af0be9f904fbf8be675eaa8b8cdc24311ced/flytekit/core/type_engine.py#L1134) +- in Copilot, the data downloader [literal handling](https://github.com/flyteorg/flyte/blob/5f4199899922ca63f7690c82dfca42a783db64c3/flytecopilot/data/download.go#L219-L248) should fetch the value -For large outputs (like large maps of large dataclasses), Flytekit should also know how to offload the data. This should be done transparently to the user. How will propeller know to fail though if propeller hasn't been updated? +As a follow-up, we can also implement literal offloading in the SDK for conventional python tasks. Flytekit should also know how to offload the data. This should be done transparently to the user. + +We should fail fast in the SDK for too-large literals as part of the initial round of changes. + +**Open Question:** How will flytekit know to fail though if propeller hasn't been updated? ### 3.4 Other Implications -Does console need to change at all? +#### Flytekit Remote +Flytekit Remote will need to be updated to handle offloaded literals. In order to fetch offloaded literals by URI, users must now authenticate with their cloud provider on their machines using a role which has read access to the _metadata bucket_. + +#### Console +Console code should show the offloaded literal URI and gracefully handle nil Literal [values](https://github.com/flyteorg/flyte/blob/4a7c3c0040b1995a43939407b99ca3e87b1eb752/flyteidl/protos/flyteidl/core/literals.proto#L96-L105). ## 4 Metrics & Dashboards @@ -45,8 +92,9 @@ Alternate suggestions that were proposed include * For map tasks, change the type of the output to a Union of the current user defined List and a new Offloaded type. We felt this would be a bit awkward since it changes the user-facing type itself (like if you were to pull up the map task definition in the API endpoint). It's also not extensible to other types of literals (maps of large dataclasses for example). -* Build off of the input wrapper construct that's still in PR. The idea was to have the wrapper contain in large cases, a reference to the data, and in small cases, the data itself. We didn't fully like this idea because the entire input set or output set needs to be offloaded. +* Build off of the input wrapper construct that's still in [PR](https://github.com/flyteorg/flyte/pull/4298). The idea was to have the wrapper contain in large cases, a reference to the data, and in small cases, the data itself. We didn't fully like this idea because the entire input set or output set needs to be offloaded. * If the task downstream of a map task takes both the output list, along with some other input, after creating and upload the large pb file for the map task's output, Propeller would need to re-upload the entire large list or map (one time for each downstream task). If the offloading is done per literal, Propeller can just upload once and use. +* Modify the workflow CRD to include the offloading bits so that they're respected at execution time, and serialized at registration time. This is a bit heavier handed than just respecting the SDK version ## 7 Potential Impact and Dependencies @@ -54,6 +102,7 @@ There's a couple edge cases that will just not work. * If the map task is of an older flytekit version but for some reason the downstream task is of a newer version, Propeller will fail unnecessarily. * If the map task is a newer version, but the downstream task is an older version, the downstream task will fail correctly. +* If workflow is using an older SDK version and launches a child workflow with a newer SDK version, the parent workflow will fail to resolve the child workflow outputs Are there concerns about the fact that if we're offloading data once, and then sharing the pointer, we're no longer copying-by-value? Does this break any of the guarantees of Flyte and will we need to be more careful in the future around other changes to avoid issues? @@ -65,5 +114,6 @@ Is there anything around sampled data, or automatically computed actual metadata ## 9 Conclusion -*Here, we briefly outline why this is the right decision to make at this time and move forward!* +Moving to literal offloading fixes a common and frustrating pain point around map tasks. It's a relatively simple change that should have a big impact on the usability of Flyte. +```