You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Many models have an option to share parameters between the input embedding layer and an output dense layer. We need a solution for that in Axon, but I open this issue, since we already have a lot of TODOs in the code for this specific issue.
The reason loading currently works is that the PyTorch .bin export includes both layers and both point to a tensor. In case of safetensors, only one of the layers may be present, so this issues is a prerequisite for defaulting to safetensors (#255).
I actually think this is something we can do with the model state struct, since we can store metadata we can also tell when 2 parameters are tied. It's just a matter of determining an API to tie the buffers. If we know from safetensors that they are tied on load then it should be easier, I'm just thinking of how it would be done when declaring the model
Many models have an option to share parameters between the input embedding layer and an output dense layer. We need a solution for that in Axon, but I open this issue, since we already have a lot of TODOs in the code for this specific issue.
The reason loading currently works is that the PyTorch
.bin
export includes both layers and both point to a tensor. In case of safetensors, only one of the layers may be present, so this issues is a prerequisite for defaulting to safetensors (#255).For additional discussion see #263.
The text was updated successfully, but these errors were encountered: