Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse checkout for git pulls #15185

Open
cpinflux opened this issue Sep 3, 2024 · 2 comments · May be fixed by #15824
Open

Sparse checkout for git pulls #15185

cpinflux opened this issue Sep 3, 2024 · 2 comments · May be fixed by #15824
Labels
enhancement An improvement of an existing feature

Comments

@cpinflux
Copy link

cpinflux commented Sep 3, 2024

Describe the current behavior

When cloning a git repository, the entire repository is cloned, this can take longer than the flow run itself.

Describe the proposed behavior

Additional configuration in prefect.yaml, and equivalent commands should permit a sparse checkout, maybe just of a single directory.

An additional touch would be to make the selected directory become the base of the clone, rather than retaining the full path, making it easy to just keep paths relative in code.

Example Use

in prefect.yaml

pull:
- prefect.deployments.steps.git_clone:
    repository: https://github.com/org/repo.git
    access_token: '{{ prefect.blocks.secret.github-token }}'
    directory: that/one/directory/where/my/code/is

Additional context

No response

@cpinflux cpinflux added the enhancement An improvement of an existing feature label Sep 3, 2024
@cicdw
Copy link
Member

cicdw commented Sep 3, 2024

Makes sense, and seems potentially doable using something like https://git-scm.com/docs/git-sparse-checkout

@tetracionist
Copy link

tetracionist commented Oct 27, 2024

I had a go at creating a branch for this feature: https://github.com/tetracionist/prefect/tree/sparse-checkout-for-git-pulls
I haven't contributed to Prefect before, so I appreciate this will need more work before I create a PR for it.

It uses sparse-checkout, but to get it to work, I needed to update the Git version on my laptop to version >=2.25.0.
On the docker image for prefect-3 latest, this should be fine as it uses debian/git 1:2.39.5-0+deb12u1

I tested this using a public repository and a process worker and it gave me a directory with the folders I specified.
I'll add some tests and happy to provide any screenshots needed.

The config works like this:

pull:
- prefect.deployments.steps.git_clone:
    repository: https://github.com/tetracionist/prefect.git
    branch: sparse-checkout-for-git-pulls
    access_token:
    directories: [src/integrations/prefect-azure, src/integrations/prefect-dask]
    cone_mode: True # set to true by default, only need to include if False

Sparsly checked out repo using my fork:
image

@tetracionist tetracionist linked a pull request Oct 28, 2024 that will close this issue
4 tasks
@desertaxle desertaxle linked a pull request Oct 28, 2024 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An improvement of an existing feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants