Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The loss depends on the max length of sequence in a batch, which should not be the case. #78

Open
harshit158 opened this issue Jan 30, 2019 · 1 comment

Comments

@harshit158
Copy link

loss = tf.reduce_mean(-log_likelihood)

Here log_likelihood is the unnormalized score for each of the samples, and depends on the number of timesteps for a batch.
For eg:
Batch 1: Max sequence length (timesteps) = 100
Batch 2 : Max sequence length (timesteps) = 1000
The scale of loss values will be considerably different for both batches

Shouldn't the loss be:
loss = tf.reduce_mean(-log_likelihood / tf.shape(self.logits)[1])
tf.shape(self.logits)[1] => This represents the max timesteps (seq length) for that batch.
Hence this makes the loss independent of the sequence length.

@guillaumegenthial
Copy link
Owner

See my comment on the blog post and feel free to open a PR (with re-training results to prove that it really helps).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants