Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

My question #195

Open
Messiz opened this issue Mar 21, 2022 · 2 comments
Open

My question #195

Messiz opened this issue Mar 21, 2022 · 2 comments

Comments

@Messiz
Copy link

Messiz commented Mar 21, 2022

if trg_emb_prj_weight_sharing:
            # Share the weight between target word embedding & last dense layer
            self.trg_word_prj.weight = self.decoder.trg_word_emb.weight
if emb_src_trg_weight_sharing:
            self.encoder.src_word_emb.weight = self.decoder.trg_word_emb.weight

The code above want to realize weight share, but I'm confused that the embed layer and the linear layer have different shape of weight. How can this assignment work?

@yingying123321
Copy link

I just found the information from the doc of pytorch(in the attaced picture). It shows that for a fc = nn.Linear(d_model, n_trg_vocab), actually the shape of fc's weight is (n_trg_vocab, d_model)!

linear

@Messiz
Copy link
Author

Messiz commented Aug 1, 2022

Thank you for your answer, but I figured it out a few days after that by myself. Thanks anyway!😂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants