Is there a way to map a TensorDict? #504

NightMachinery · 2023-07-31T12:10:50Z

I am currently using HuggingFace Datasets to load, process, and save some data. However, HF Datasets saves the data in the Arrow format and wastes a lot of time converting between Arrow and PyTorch tensors.

I am wondering if I can use memory-mapped TensorDicts for this purpose?

How can I do a map on batches of TensorDict?

Looking through the tutorials, the nearest example I found was using a Dataloader with collate_fn as the map function:

DataLoader(
    tensor_dict, batch_size=batch_size, collate_fn=map_fn,
)

But this does not allow me to form a pipeline of map functions. I also don't know how to save and load the resulting dataset.

The text was updated successfully, but these errors were encountered:

vmoens · 2023-08-01T14:21:44Z

This is on our radar!
I like the idea of having a (possibly multiprocessed) map to execute some transform over all the elements of a tensordict.
Stay tuned, I'll ping you once we have PR with this

NightMachinery changed the title ~~Is there a way to map TensorDict?~~ Is there a way to map a TensorDict? Jul 31, 2023

vmoens mentioned this issue Sep 4, 2023

[Feature] TensorDict.map #518

Merged

vmoens closed this as completed in #518 Sep 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way to map a TensorDict? #504

Is there a way to map a TensorDict? #504

NightMachinery commented Jul 31, 2023

vmoens commented Aug 1, 2023

Is there a way to map a TensorDict? #504

Is there a way to map a TensorDict? #504

Comments

NightMachinery commented Jul 31, 2023

vmoens commented Aug 1, 2023