Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC into RFC #5653

Merged
merged 648 commits into from
Aug 13, 2024
Merged

RFC into RFC #5653

merged 648 commits into from
Aug 13, 2024

Conversation

katrogan
Copy link
Contributor

What changes were proposed in this pull request?

Updates draft RFC here: #5103

How was this patch tested?

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

Tom-Newton and others added 30 commits April 3, 2024 20:06
* Remove shard key in admin-launcher

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Don't mutate existing state

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Don't mutate state

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Add a test

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

---------

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>
Signed-off-by: YiShengOng <yshengong@gmail.com>
…4986)

* Add tracking for active node and task execution counts in propeller

Signed-off-by: Shardool <shardool.s@gmail.com>

* Update unit tests for task and node execution counts

Signed-off-by: Shardool <shardool.s@gmail.com>

* Fix linter errors

Signed-off-by: Shardool <shardool.s@gmail.com>

* fix linter errors

Signed-off-by: Paul Dittamo <pvdittamo@gmail.com>

---------

Signed-off-by: Shardool <shardool.s@gmail.com>
Signed-off-by: Paul Dittamo <pvdittamo@gmail.com>
Co-authored-by: Paul Dittamo <pvdittamo@gmail.com>
…ors (#5161)

* include container statuses for all container exit errors

Signed-off-by: Paul Dittamo <pvdittamo@gmail.com>

* add unit test

Signed-off-by: Paul Dittamo <pvdittamo@gmail.com>

---------

Signed-off-by: Paul Dittamo <pvdittamo@gmail.com>
* docs(flyte-core): add missing key `adminServer` in authentication guide

Signed-off-by: Julian Einhaus <julian.einhaus@plan.one>

* docs(flyte-core): fix indentation for flyte-core with AzureAD authorization guide

Signed-off-by: Julian Einhaus <julian.einhaus@plan.one>

---------

Signed-off-by: Julian Einhaus <julian.einhaus@plan.one>
Signed-off-by: Daniel Rammer <daniel@union.ai>
* update arraynode proto parallelism field to varint compatible int64

Signed-off-by: Paul Dittamo <pvdittamo@gmail.com>

* have array nodes utilize workflow parallelism

Signed-off-by: Paul Dittamo <pvdittamo@gmail.com>

* return if available parallelism is 0

Signed-off-by: Paul Dittamo <pvdittamo@gmail.com>

* unit test

Signed-off-by: Paul Dittamo <pvdittamo@gmail.com>

---------

Signed-off-by: Paul Dittamo <pvdittamo@gmail.com>
* Don't use `defer` for streak length reporting

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Make it work with defer

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

* Fix lint

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>

---------

Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Kevin Su <pingsutw@gmail.com>
* Testing agents in the development environment

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* nit

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* nit

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* update

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* rename

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* blank

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* rerun build docs ci

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* update pingsu's advice

Signed-off-by: Future-Outlier <eric901201@gmail.com>
Co-authored-by: Kevin Su <pingsutw@gmail.com>

* Update pingsu's advice

Signed-off-by: Future-Outlier <eric901201@gmail.com>
Co-authored-by: Kevin Su <pingsutw@gmail.com>

* deploying agents in the sandbox

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* rename

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* nit

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* Implementing Agent Metadata Service

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* reorganize and copyedit new content

Signed-off-by: nikki everett <nikki@union.ai>

---------

Signed-off-by: Future-Outlier <eric901201@gmail.com>
Signed-off-by: nikki everett <nikki@union.ai>
Co-authored-by: Kevin Su <pingsutw@gmail.com>
Co-authored-by: nikki everett <nikki@union.ai>
Signed-off-by: Ketan Umare <kumare3@users.noreply.github.com>
Co-authored-by: Ketan Umare <kumare3@users.noreply.github.com>
Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
Signed-off-by: Fabio M. Graetz, Ph.D. <fabiograetz@googlemail.com>
Signed-off-by: Fabio Grätz <fabiogratz@googlemail.com>
Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
Co-authored-by: Kevin Su <pingsutw@gmail.com>
* add cache client read and write otel tracing

Signed-off-by: Paul Dittamo <pvdittamo@gmail.com>

* lint

Signed-off-by: Paul Dittamo <pvdittamo@gmail.com>

---------

Signed-off-by: Paul Dittamo <pvdittamo@gmail.com>
Signed-off-by: nikki everett <nikki@union.ai>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
…core` deployment (#5208)

Signed-off-by: Future-Outlier <eric901201@gmail.com>
* Add reference to prom operator install guide

Signed-off-by: davidmirror-ops <david.espejo@union.ai>

* Adds info about the three base dashboards

Signed-off-by: davidmirror-ops <david.espejo@union.ai>

* Adds instructions to enable SMs

Signed-off-by: davidmirror-ops <david.espejo@union.ai>

* Incorporate reviews

Signed-off-by: davidmirror-ops <david.espejo@union.ai>

* Minor fixes

Signed-off-by: davidmirror-ops <david.espejo@union.ai>

* Improve format for steps

Signed-off-by: davidmirror-ops <david.espejo@union.ai>

---------

Signed-off-by: davidmirror-ops <david.espejo@union.ai>
* chore: remove obsolete flyte config files

* nit

Signed-off-by: Kevin Su <pingsutw@gmail.com>

---------

Signed-off-by: Kevin Su <pingsutw@gmail.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
* enable parallelism to be set to nil for array node

Signed-off-by: Paul Dittamo <pvdittamo@gmail.com>

* unit test

Signed-off-by: Paul Dittamo <pvdittamo@gmail.com>

---------

Signed-off-by: Paul Dittamo <pvdittamo@gmail.com>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: yini.gao@schibsted.com <yini.gao@schibsted.com>
… template (#5215)

* update with link to dockerfile template

Signed-off-by: nikki everett <nikki@union.ai>

* fix 404 error

Signed-off-by: nikki everett <nikki@union.ai>

---------

Signed-off-by: nikki everett <nikki@union.ai>
* copy changes over from flytesnacks#1553

Signed-off-by: nikki everett <nikki@union.ai>

* fix formatting

Signed-off-by: nikki everett <nikki@union.ai>

* fix 404 error

Signed-off-by: nikki everett <nikki@union.ai>

---------

Signed-off-by: nikki everett <nikki@union.ai>
Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
Signed-off-by: Kevin Su <pingsutw@gmail.com>
Co-authored-by: Kevin Su <pingsutw@gmail.com>
Signed-off-by: Kevin Su <pingsutw@gmail.com>
Sovietaced and others added 21 commits August 1, 2024 22:19
* Refactor panic handling to middleware

Signed-off-by: Jason Parraga <sovietaced@gmail.com>

* Remove registration of old panicCounter

Signed-off-by: Jason Parraga <sovietaced@gmail.com>

* Add test coverage

Signed-off-by: Jason Parraga <sovietaced@gmail.com>

---------

Signed-off-by: Jason Parraga <sovietaced@gmail.com>
* TEST build

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* remove emphasize-lines

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* test build

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* revert

Signed-off-by: Future-Outlier <eric901201@gmail.com>

---------

Signed-off-by: Future-Outlier <eric901201@gmail.com>
* FlytePropeller Compiler Avoid Crash when Type not found

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* Update pingsu's error message advices

Signed-off-by: Future-Outlier <eric901201@gmail.com>
Co-authored-by: pingsutw  <pingsutw@apache.org>

* fix lint

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* Trigger CI

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* Trigger CI

Signed-off-by: Future-Outlier <eric901201@gmail.com>

---------

Signed-off-by: Future-Outlier <eric901201@gmail.com>
Co-authored-by: pingsutw <pingsutw@apache.org>
* first version

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* update

Signed-off-by: Future-Outlier <eric901201@gmail.com>

---------

Signed-off-by: Future-Outlier <eric901201@gmail.com>
Signed-off-by: ddl-rliu <140021987+ddl-rliu@users.noreply.github.com>
* add send arg

Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>

* Add acction to remove cache in gh runner

Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>

* Use correct checked out path

Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>

* Path in strings

Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>

* Checkout repo in root

Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>

* Use the correct path to new action

Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>

* Do not use gh var in path to clear-action-cache

Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>

* Remove wrong invocation of clear-action-cache

Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>

* GITHUB_WORKSPACE is implicit in the checkout action

Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>

* Refer to local `flyte` directory

Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>

---------

Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Andrew Dye <andrewwdye@gmail.com>
* Make flyteidl releases go through a manual gh workflow

Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>

* Make flytectl releases go through a manual gh workflow

Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>

* Rewrite the documentation for `version` and clarify wording in RELEASE.md

Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>

---------

Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
* fix CHANGELOG-v0.2.0.md

Signed-off-by: Christina <156356273+cratiu222@users.noreply.github.com>

* fix CHANGELOG-v1.0.2-b1.md

Signed-off-by: Christina <156356273+cratiu222@users.noreply.github.com>

* fix CHANGELOG-v1.1.0.md

Signed-off-by: Christina <156356273+cratiu222@users.noreply.github.com>

* fix CHANGELOG-v1.3.0.md

Signed-off-by: Christina <156356273+cratiu222@users.noreply.github.com>

---------

Signed-off-by: Christina <156356273+cratiu222@users.noreply.github.com>
* Fetch all tags in flyteidl-release.yml

Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>

* Fix sed expression for npm job

Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>

---------

Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
* update

Signed-off-by: Desi Hsu <desihsu@gmail.com>

* dco

Signed-off-by: Desi Hsu <desihsu@gmail.com>

* dco

Signed-off-by: Desi Hsu <desihsu@gmail.com>

* typo

Signed-off-by: Desi Hsu <desihsu@gmail.com>

---------

Signed-off-by: Desi Hsu <desihsu@gmail.com>
Signed-off-by: Katrina Rogan <katroganGH@gmail.com>
#5648)

Signed-off-by: Katrina Rogan <katroganGH@gmail.com>
#5649)

* Don't error when attempting to trigger schedules for inactive projects

Signed-off-by: Katrina Rogan <katroganGH@gmail.com>

* regen

Signed-off-by: Katrina Rogan <katroganGH@gmail.com>

---------

Signed-off-by: Katrina Rogan <katroganGH@gmail.com>
Copy link
Contributor

@wild-endeavor wild-endeavor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the updates!

// If this literal is offloaded, this field will contain metadata including the offload location.
string uri = 6;
// Includes information about the size of the literal.
uint64 size_bytes = 7;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is size important again? I mean there's other metadata as well (etag information). The assumption here is that size is super important so we want to be able to show that without making a head call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general consensus seemed to be this is useful, I guess it's nice for clients who want to decide whether to pull massive datasets?


When writing outputs in the [remote_file_output_writer](https://github.com/flyteorg/flyte/blob/2ca31119d6b9258661a71f38e450f93b6692402c/flyteplugins/go/tasks/pluginmachinery/ioutils/remote_file_output_writer.go#L56-L84) the source code should detect whether the literal size exceeds the configured minimum and
- if the task is using a newer SDK version that supports reading offloaded literals, offload the literal to the configured storage backend and update the literal with the offload URI and size.
- if the task is using an older SDK version that doesn't support offloaded literals, fail the task with an error message indicating that the task output is too large and the user should update their SDK version.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add a line to say downstream tasks also have to upgraded? that is, if you have a reference/remote task downstream that consumes the map task output, but it hasn't been updated, then it'll fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks


For large outputs (like large maps of large dataclasses), Flytekit should also know how to offload the data. This should be done transparently to the user. How will propeller know to fail though if propeller hasn't been updated?
As a follow-up, we can also implement literal offloading in the SDK for conventional python tasks. Flytekit should also know how to offload the data. This should be done transparently to the user.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we cover that here? or is this only for map tasks for now? for the general case, we were going to go with the solution of propeller setting an environment variable that turns on offloading on the flytekit side?

for erroring if things get too big, i don't know that there's a solution. We should just add a size limit asap in flytekit right @eapolinario? Some env var based setting with a 10MB default. If a literal is more than 10MBs then error. considering we don't know when we'll get to the general case, by the time we do, most users might've already upgraded.

Copy link
Contributor Author

@katrogan katrogan Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're leaning towards not tackling the implementation bits for this proposal but I think it's okay to cover future work here?

Updated to include the bits about failing fast here for too-large literals, thank you!

@katrogan
Copy link
Contributor Author

thanks for the review @wild-endeavor, updated - mind taking another look?

Copy link

codecov bot commented Aug 13, 2024

Codecov Report

Attention: Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Project coverage is 36.17%. Comparing base (b6bba77) to head (e89f434).

Files Patch % Lines
datacatalog/cmd/entrypoints/serve.go 0.00% 1 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (b6bba77) and HEAD (e89f434). Click for more details.

HEAD has 13 uploads less than BASE
Flag BASE (b6bba77) HEAD (e89f434)
6 0
unittests 7 0
Additional details and impacted files
@@                    Coverage Diff                     @@
##           rfc/offloaded-literal    #5653       +/-   ##
==========================================================
- Coverage                  58.99%   36.17%   -22.83%     
==========================================================
  Files                        645     1302      +657     
  Lines                      55670   109484    +53814     
==========================================================
+ Hits                       32844    39606     +6762     
- Misses                     20230    65740    +45510     
- Partials                    2596     4138     +1542     
Flag Coverage Δ
unittests ?
unittests-datacatalog 51.37% <0.00%> (?)
unittests-flyteadmin 55.33% <ø> (?)
unittests-flytecopilot 12.17% <ø> (?)
unittests-flytectl 62.28% <ø> (?)
unittests-flyteidl 7.08% <ø> (?)
unittests-flyteplugins 53.31% <ø> (?)
unittests-flytepropeller 41.74% <ø> (?)
unittests-flytestdlib 55.35% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@katrogan katrogan merged commit 59192b7 into rfc/offloaded-literal Aug 13, 2024
47 of 50 checks passed
@katrogan katrogan deleted the rfc/katrina-offloaded-literal branch August 13, 2024 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.