Parameter persistence with sharding support #338

jonatanklosko · 2024-02-15T17:39:20Z

Currently whenever we load a model, we need to convert their layout from whatever PyTorch uses to whatever Axon uses (mostly transposition of dense and conv layers). For smaller models this is quick, however for large models this: (a) introduces loading overhead; ~~(b) consumes much memory (this prevents from loading params directly onto the GPU, which would make sense in a single-GPU use case)~~ (fixed in #344).

Ideally we would have an easy way to persist the loaded parameters into multiple files (in case of large parameters). With that, the user could do Bumblebee.load_model/2, persist the parameters into a file, then in production load the parameters directly without the conversion overhead (possibly straight onto the GPU).

This probably belongs in Axon directly, but may as well track here given the use case. I also wonder if we should be using Safetensors rather than term-to-binary for better portability. One issue with Safetensors is that it supports flat map, but Axon parameters can be any Nx.Container (e.g. LSTM uses tuples), so unless we make Axon parameters more strict we can't really do it.

This also depends on elixir-nx/axon#553, which changes params into a struct, and we likely want to persist the whole struct.

The text was updated successfully, but these errors were encountered:

josevalim · 2024-02-16T08:35:52Z

The flat parameters should not really be a problem, should it? You could convert a nested map of keys “foo” and “bar” into a special flattened key, such as “foo——bar”, no?

jonatanklosko · 2024-02-16T09:31:15Z

@josevalim the nested map is not a problem, it's other Nx.Containers (currently tuples), so it may make sense to restrict Axon parameters to tensors.

jonatanklosko · 2024-02-22T09:07:09Z

Sidenote: sharding is a nice-to-have, but with elixir-nx/safetensors#8 we should be able to write all parameters into a single file efficiently.

jonatanklosko · 2024-02-23T09:29:41Z

With #344 the main motivation (excessive memory usage) is addressed, so this is less of a priority. It would still reduce some time overhead necessary for transforming the params. Either way, we should have a good way of persisting large parameters (again, rather in Axon).

jonatanklosko added kind:feature New feature or request note:discussion Details up for discussion labels Feb 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parameter persistence with sharding support #338

Parameter persistence with sharding support #338

jonatanklosko commented Feb 15, 2024 •

edited

Loading

josevalim commented Feb 16, 2024

jonatanklosko commented Feb 16, 2024

jonatanklosko commented Feb 22, 2024

jonatanklosko commented Feb 23, 2024

Parameter persistence with sharding support #338

Parameter persistence with sharding support #338

Comments

jonatanklosko commented Feb 15, 2024 • edited Loading

josevalim commented Feb 16, 2024

jonatanklosko commented Feb 16, 2024

jonatanklosko commented Feb 22, 2024

jonatanklosko commented Feb 23, 2024

jonatanklosko commented Feb 15, 2024 •

edited

Loading