Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tied word embeddings #339

Open
jonatanklosko opened this issue Feb 21, 2024 · 2 comments
Open

Tied word embeddings #339

jonatanklosko opened this issue Feb 21, 2024 · 2 comments
Labels
kind:chore Internal improvements note:upstream The issue must be tackled upstream

Comments

@jonatanklosko
Copy link
Member

jonatanklosko commented Feb 21, 2024

Many models have an option to share parameters between the input embedding layer and an output dense layer. We need a solution for that in Axon, but I open this issue, since we already have a lot of TODOs in the code for this specific issue.

The reason loading currently works is that the PyTorch .bin export includes both layers and both point to a tensor. In case of safetensors, only one of the layers may be present, so this issues is a prerequisite for defaulting to safetensors (#255).

For additional discussion see #263.

@jonatanklosko jonatanklosko added note:upstream The issue must be tackled upstream kind:chore Internal improvements labels Feb 21, 2024
@seanmor5
Copy link
Contributor

I actually think this is something we can do with the model state struct, since we can store metadata we can also tell when 2 parameters are tied. It's just a matter of determining an API to tie the buffers. If we know from safetensors that they are tied on load then it should be easier, I'm just thinking of how it would be done when declaring the model

@jonatanklosko
Copy link
Member Author

If we know from safetensors that they are tied on load then it should be easier

We will know if they are tied based on the spec attribute, as in spec.tie_word_embeddings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:chore Internal improvements note:upstream The issue must be tackled upstream
Projects
None yet
Development

No branches or pull requests

2 participants