You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently whenever we load a model, we need to convert their layout from whatever PyTorch uses to whatever Axon uses (mostly transposition of dense and conv layers). For smaller models this is quick, however for large models this: (a) introduces loading overhead; (b) consumes much memory (this prevents from loading params directly onto the GPU, which would make sense in a single-GPU use case) (fixed in #344).
Ideally we would have an easy way to persist the loaded parameters into multiple files (in case of large parameters). With that, the user could do Bumblebee.load_model/2, persist the parameters into a file, then in production load the parameters directly without the conversion overhead (possibly straight onto the GPU).
This probably belongs in Axon directly, but may as well track here given the use case. I also wonder if we should be using Safetensors rather than term-to-binary for better portability. One issue with Safetensors is that it supports flat map, but Axon parameters can be any Nx.Container (e.g. LSTM uses tuples), so unless we make Axon parameters more strict we can't really do it.
This also depends on elixir-nx/axon#553, which changes params into a struct, and we likely want to persist the whole struct.
The text was updated successfully, but these errors were encountered:
The flat parameters should not really be a problem, should it? You could convert a nested map of keys “foo” and “bar” into a special flattened key, such as “foo——bar”, no?
With #344 the main motivation (excessive memory usage) is addressed, so this is less of a priority. It would still reduce some time overhead necessary for transforming the params. Either way, we should have a good way of persisting large parameters (again, rather in Axon).
Currently whenever we load a model, we need to convert their layout from whatever PyTorch uses to whatever Axon uses (mostly transposition of dense and conv layers). For smaller models this is quick, however for large models this: (a) introduces loading overhead;
(b) consumes much memory (this prevents from loading params directly onto the GPU, which would make sense in a single-GPU use case)(fixed in #344).Ideally we would have an easy way to persist the loaded parameters into multiple files (in case of large parameters). With that, the user could do
Bumblebee.load_model/2
, persist the parameters into a file, then in production load the parameters directly without the conversion overhead (possibly straight onto the GPU).This probably belongs in Axon directly, but may as well track here given the use case. I also wonder if we should be using Safetensors rather than term-to-binary for better portability. One issue with Safetensors is that it supports flat map, but Axon parameters can be any Nx.Container (e.g. LSTM uses tuples), so unless we make Axon parameters more strict we can't really do it.
This also depends on elixir-nx/axon#553, which changes params into a struct, and we likely want to persist the whole struct.
The text was updated successfully, but these errors were encountered: