Tied word embeddings #339

jonatanklosko · 2024-02-21T10:47:46Z

Many models have an option to share parameters between the input embedding layer and an output dense layer. We need a solution for that in Axon, but I open this issue, since we already have a lot of TODOs in the code for this specific issue.

The reason loading currently works is that the PyTorch .bin export includes both layers and both point to a tensor. In case of safetensors, only one of the layers may be present, so this issues is a prerequisite for defaulting to safetensors (#255).

For additional discussion see #263.

The text was updated successfully, but these errors were encountered:

seanmor5 · 2024-02-21T13:38:48Z

I actually think this is something we can do with the model state struct, since we can store metadata we can also tell when 2 parameters are tied. It's just a matter of determining an API to tie the buffers. If we know from safetensors that they are tied on load then it should be easier, I'm just thinking of how it would be done when declaring the model

jonatanklosko · 2024-02-21T13:49:40Z

If we know from safetensors that they are tied on load then it should be easier

We will know if they are tied based on the spec attribute, as in spec.tie_word_embeddings.

jonatanklosko added note:upstream The issue must be tackled upstream kind:chore Internal improvements labels Feb 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tied word embeddings #339

Tied word embeddings #339

jonatanklosko commented Feb 21, 2024 •

edited

Loading

seanmor5 commented Feb 21, 2024

jonatanklosko commented Feb 21, 2024

Tied word embeddings #339

Tied word embeddings #339

Comments

jonatanklosko commented Feb 21, 2024 • edited Loading

seanmor5 commented Feb 21, 2024

jonatanklosko commented Feb 21, 2024

jonatanklosko commented Feb 21, 2024 •

edited

Loading