Skip to content
This repository has been archived by the owner on Oct 13, 2022. It is now read-only.

Label smoothing for LF-MMI #179

Open
danpovey opened this issue Apr 28, 2021 · 14 comments
Open

Label smoothing for LF-MMI #179

danpovey opened this issue Apr 28, 2021 · 14 comments

Comments

@danpovey
Copy link
Contributor

Perhaps @zhu-han can try this..
we should be able to implement label smoothing for our LF-MMI system by adding some small constant times -nnet_output.mean(1).sum() / (len(texts) * accum_grad) to the loss function [assuming we are still normalizing
by len(texts) which IMO is not as optimal as normalizing by total num-frames, but that's a separate issue.]

@zhu-han
Copy link
Contributor

zhu-han commented Apr 28, 2021

Will try this.

@zhu-han
Copy link
Contributor

zhu-han commented Apr 30, 2021

In -nnet_output.mean(1), is 1 the dimension of output classes?
I did this way by adding

smooth_score = nnet_output.mean(-1).sum()
tot_score += self.smooth_scale * smooth_score

in


The results are:

smooth scale test-clean test-other
0 6.79 17.8
0.01 6.8 18.5
0.001 6.94 17.97
0.0001 6.73 18.05
0.00001 6.86 17.66

There is no clear improvement by now.

@csukuangfj
Copy link
Collaborator

If the shape of nnet_output is (N, T, C), should it be

-nnet_output.mean(2).sum()

?
It assigns an equal probability for each class per frame and takes the sum of the weighted log_probs per frame,
which is mean(2).


Also, the denominator (len(texts) * accum_grad) seems to be missing.

@zhu-han
Copy link
Contributor

zhu-han commented Apr 30, 2021

Yes, the shape is (N, T, C). I think mean(2) and mean(-1) is the same here. I added this smooth_score to the tot_score. So it will be normalized by (len(texts) * accum_grad) together with original tot_score.

@danpovey
Copy link
Contributor Author

danpovey commented Apr 30, 2021 via email

@zhu-han
Copy link
Contributor

zhu-han commented Apr 30, 2021

Will try it.
If we print the original mmi loss (i.e. the total loss - weighted smooth loss) like before as the objf, the validation average objf is not affected much. Take the last validation result of training for example:

smooth scale validation average objf
0 0.204217
0.01 0.199484
0.001 0.217441
0.0001 0.20684
0.00001 0.206871

With smooth scale 0.01, the validation objf is actually better.

As for the loss value, with smooth scale 0.01, the weighted smooth loss is about 30% ~ 150% of the original mmi loss (i.e. the total loss - weighted smooth loss).

@danpovey
Copy link
Contributor Author

danpovey commented Apr 30, 2021 via email

@zhu-han
Copy link
Contributor

zhu-han commented Apr 30, 2021

I disabled it for validation.

@zhu-han
Copy link
Contributor

zhu-han commented May 1, 2021

There are results with smooth scale 1 and 0.1.

smooth scale test-clean test-other
0 6.79 17.8
1 11.42 27.1
0.1 7.63 19.71
0.01 6.8 18.5

The results are clearly worse than the one without smooth loss.

@pzelasko
Copy link
Collaborator

pzelasko commented May 1, 2021

Since we have an ali model then maybe another option is to add frame wise cross entropy loss using that alignment, and apply the label smoothing there?

@danpovey
Copy link
Contributor Author

danpovey commented May 1, 2021 via email

@danpovey
Copy link
Contributor Author

danpovey commented May 1, 2021

@zhu-han do you think you could read about "iterated loss" here
https://arxiv.org/pdf/2005.09150v2.pdf
and its references, and try to implement something like that?
You could try actually using the LF-MMI loss at earlier layers, e.g randomly choose the layer to avoid excessive computation; and/or use the cross-entropy type of loss that we previously tried (it didn't work) based on the posteriors of the LF-MMI system.

@zhu-han
Copy link
Contributor

zhu-han commented May 1, 2021

I'll try it.

@danpovey
Copy link
Contributor Author

danpovey commented May 1, 2021 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants