-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix mixup of paths in put function #1179
Conversation
`put` currently sorts the list of input paths so when they are not in order before already, the don't match their output paths anymore and file names get mixed up. This fixes fsspec/gcsfs#468
Note: In I didn't remove sorting completely to stay backwards compatible and also because |
@ianthomas23 , do you have time to review this? |
Yes, I can take a look tomorrow morning, UK time. |
@quassy Thanks for looking at this. I think the solution ends up being more complicated, as is often the case. At the end of To proceed with this I think there are a few options:
Personally I think option 2 is the best, although there might be alternatives that I haven't thought of yet. Are you prepared to try out an |
Issue #897 is related to this too. |
@ianthomas23 Thanks as well for the quick response! Everything is trivial, except... I'm not actually sure anymore if expand_path for lpath makes sense at all and what the expected behaviour of put is, when you do something like this below, because the order of glob is not really known: # src_directory/c.txt
# src_directory/b.txt
put(lpath=["src_directory/*.txt"], rpath=["gs://dst_bucket/c_path/", "gs://dst_bucket/b_path/"])
# at the moment b.txt would be written to c_path
put(lpath=["src_directory/*.txt"], rpath=["gs://dst_bucket/c_path/", "gs://dst_bucket/b_path/", "gs://dst_bucket/a_path/"])
# this would fail as other_paths() requires the two resulting lists to be the same length Some ideas which I'm all not super happy with:
|
This is what I use most commonly which does need the wildcard expansion: fs.put(lpath="source/*", rpath="target") but the outcome is the same regardless of sorting or not as the target filenames are determined one a time from each of the wildcard-expanded source filenames. Actually this doesn't do what it should in all circumstances (I am looking into this) so what I tend to use is the inferred wildcard for fs.put(lpath="source/", rpath="target") Of your 3 options I don't like 1 or 3, so 2 seems the best (or least bad anyway!). What should happen in various scenarios? If a user specifies a list for either I know this needs more thought, but so far I think this brings us back to changing We have made a few changes in this area of the code recently, so if we need to do something backwards-incompatible then this might be a good time to do it. |
@ianthomas23 , happy to have a live chat about the best way forward here |
Good idea, I have booked a slot in your calendar tomorrow. |
@quassy , can you please check if this is still an issue given @ianthomas23 's work in the area? |
put
currently sorts the list of input paths so when they are not in order before already, they don't match their output paths anymore and file names get mixed up. This fixes fsspec/gcsfs#468