Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse file handling #37

Open
jbd opened this issue Jul 1, 2024 · 3 comments
Open

Sparse file handling #37

jbd opened this issue Jul 1, 2024 · 3 comments

Comments

@jbd
Copy link

jbd commented Jul 1, 2024

Hello,

Unless I'm mistaken, cpz does not handle sparse file and will copy null bytes over the wire. The xcp project (https://github.com/tarka/xcp) tries to iterate between sparse chunk and use copy_file_range for actual data.

I think this would be a nice addition and I hope you will consider this feature.

Sorry this is really an RFE without an associated pull request, but my rust skills are non-existent at the moment.

Thank you for this project. This is a fantastic contribution to the space of data movement.

Jean-Baptiste

@SUPERCILEX
Copy link
Owner

I looked into and it seems like a bit of a pain. You'd want to use this statx to determine if the file is sparse (by seeing if it's size on disk aka number of blocks is less than it's apparent size):

let from_metadata = statx(from_dir, file_name, AtFlags::empty(), StatxFlags::MODE)

The to copy the file sparsely, I believe you can just use
https://man7.org/linux/man-pages/man2/lseek.2.html alternating between seek_hole and seek_data to get the ranges that should be passed into copy_file_range. Then the end of the file might need an ftruncate to finish it off. Or maybe it's better to actually ftruncate the whole file at the beginning? Not sure.

@jbd
Copy link
Author

jbd commented Jul 2, 2024

Thank you for your answer !

I don't know about the ftruncate at the beginning. I think that https://github.com/hpc/mpifileutils is doing it in one of its pass (files and directories creation, data copy, setting permissions).

I agree that the sparse case looks quite tedious to implement:

https://github.com/tarka/xcp/blob/2253d6ffc1ca13e8de395700b7e4f3fd57fa42b5/libfs/src/linux.rs#L78
https://github.com/tarka/xcp/blob/2253d6ffc1ca13e8de395700b7e4f3fd57fa42b5/libxcp/src/operations.rs#L118
https://github.com/tarka/xcp/blob/2253d6ffc1ca13e8de395700b7e4f3fd57fa42b5/libxcp/src/operations.rs#L82
https://github.com/tarka/xcp/blob/2253d6ffc1ca13e8de395700b7e4f3fd57fa42b5/libfs/src/linux.rs#L206

Copying TB sparse files is not ideal too =)

In the end, this is a gentle request for enhancement, I just wanted to have your thoughts on this. I understand perfectly that you are more focused on interesting development like using io_uring.

Feel free to close this case !

Cheers.

Jean-Baptiste

@SUPERCILEX
Copy link
Owner

Copying TB sparse files is not ideal too =)

Lol, fair.

Feel free to close this case !

No no, I'm just saying that I probably won't implement this myself, but I'd be happy to accept a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants