Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add checkpoint uds-core slim package #818

Open
wants to merge 39 commits into
base: main
Choose a base branch
from
Open

Conversation

Racer159
Copy link
Contributor

@Racer159 Racer159 commented Sep 25, 2024

Description

This adds a ~75% faster way to deploy or reset a full uds-core cluster (theoretically would work for other preloaded things like testing GitLab Runner w/GitLab too).

Normal:
image

Checkpoint:
image

Tradeoffs:

  • Requires sudo - not sure of a great way around this without mangling volume permissions for containerd
  • May become unwieldy with more permutations (i.e. with layers work)
  • The cluster would be fully published (so all credentials are reused)

Related Issue

Fixes #N/A

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Other (security config, docs update, etc)

Checklist before merging

@Racer159 Racer159 changed the title feat: add frozen uds-core slim package feat: add checkpoint uds-core slim package Sep 27, 2024
@Racer159 Racer159 marked this pull request as ready for review September 27, 2024 22:54
@Racer159 Racer159 requested a review from a team as a code owner September 27, 2024 22:54
@Racer159 Racer159 self-assigned this Sep 27, 2024
@Racer159
Copy link
Contributor Author

Racer159 commented Sep 28, 2024

Checkpoint task passed in this PR (except for the actual publish task)
image

Copy link

@catsby catsby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an approver but the code does look good to me. I would like to see more information on how to use this package though so it's more clear on how/why/when someone would want to use it.

packages/checkpoint-dev/README.md Outdated Show resolved Hide resolved
packages/checkpoint-dev/zarf.yaml Show resolved Hide resolved
tasks.yaml Outdated Show resolved Hide resolved
.github/workflows/checkpoint.yaml Outdated Show resolved Hide resolved
.github/actions/setup/action.yaml Outdated Show resolved Hide resolved
packages/checkpoint-dev/zarf.yaml Show resolved Hide resolved
packages/checkpoint-dev/checkpoint.sh Outdated Show resolved Hide resolved
tasks/deploy.yaml Outdated Show resolved Hide resolved
packages/checkpoint-dev/checkpoint.sh Outdated Show resolved Hide resolved
Comment on lines +44 to +51
"/var/lib/kubelet")
echo "Copying $SOURCE to ${DATA_DIR}/kubelet_data/"
sudo cp -a "$SOURCE"/. "${DATA_DIR}/kubelet_data/"
;;
"/var/lib/rancher/k3s")
echo "Copying $SOURCE to ${DATA_DIR}/k3s_data/"
sudo cp -a "$SOURCE"/. "${DATA_DIR}/k3s_data/"
;;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During creation I see these errors (which cause the deploy to fail later):

     cp: /var/lib/docker/volumes/c0d8ea4ead46f3c6649218be409e19d1cd63bfcc68f32d548a116c7924d7a793/_data/.: No such file or directory
     cp: /var/lib/docker/volumes/822e843b8cf644f9c4c9118671f6014d32ad84a062d690e69b07d5c6fdfcfbe2/_data/.: No such file or directory

I think pretty much universally on macOS docker is run inside of a VM, in my case the VM can be accessed with colima ssh but docker desktop, rancher desktop, etc would likely have similar issues and ways to access the VM.

I was able to rewrite a portion of this script to use docker cp instead and got closer (at least didn't get errors with the volumes). I think this is probably a better, more agnostic option here and simplifies a lot of this logic (no looping through volumes, just copy the two paths we need explicitly). I was hoping it might also remove the need for sudo but in my case one of the paths gave some permission errors still until I added sudo. I'm sure there's some efficiency loss here, but since it's create time I think it's worth it to make this work across distros? In my run locally it took less than a minute still to run which still seems decently performant (granted I couldn't get it to run successfully previously so unsure of the real comparison).

Would be curious your thoughts on this - I dropped the script changes into a gist since there were a handful of changes across the entirety of the file: https://gist.github.com/mjnagel/6d681678df83067169c4e652466f704f

I also had to add --no-xattrs to the final tar command, I got warnings/errors without this (suspect that's some macOS <> Linux stuff). This got me much closer but I hit some issues with the token:

time="2024-10-02T15:19:18Z" level=fatal msg="starting kubernetes: preparing server: bootstrap data already found and encrypted with different token"

I tried to tweak the commands around startup (using the k3d --token option rather than k3s arg) and validated the token exists after extraction but couldn't figure this one out. Would be curious if you hit the same issue with my modified script and can figure out what's wrong?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants