Label smoothing for LF-MMI #179

danpovey · 2021-04-28T04:41:17Z

Perhaps @zhu-han can try this..
we should be able to implement label smoothing for our LF-MMI system by adding some small constant times -nnet_output.mean(1).sum() / (len(texts) * accum_grad) to the loss function [assuming we are still normalizing
by len(texts) which IMO is not as optimal as normalizing by total num-frames, but that's a separate issue.]

The text was updated successfully, but these errors were encountered:

zhu-han · 2021-04-28T05:12:56Z

Will try this.

zhu-han · 2021-04-30T02:01:51Z

In -nnet_output.mean(1), is 1 the dimension of output classes?
I did this way by adding

smooth_score = nnet_output.mean(-1).sum()
tot_score += self.smooth_scale * smooth_score

in

snowfall/snowfall/objectives/mmi.py

Line 94 in c5ffa3f

)

The results are:

smooth scale	test-clean	test-other
0	6.79	17.8
0.01	6.8	18.5
0.001	6.94	17.97
0.0001	6.73	18.05
0.00001	6.86	17.66

There is no clear improvement by now.

csukuangfj · 2021-04-30T02:22:59Z

If the shape of nnet_output is (N, T, C), should it be

-nnet_output.mean(2).sum()

?
It assigns an equal probability for each class per frame and takes the sum of the weighted log_probs per frame,
which is mean(2).

Also, the denominator (len(texts) * accum_grad) seems to be missing.

zhu-han · 2021-04-30T02:29:28Z

Yes, the shape is (N, T, C). I think mean(2) and mean(-1) is the same here. I added this smooth_score to the tot_score. So it will be normalized by (len(texts) * accum_grad) together with original tot_score.

danpovey · 2021-04-30T03:20:27Z

Hm. Try 0.1, just to verify that it gets worse. If it does, we'll forget this. The 18.5 is still too close to the margin of error. And how much is the printed objective function affected by this?

…

On Fri, Apr 30, 2021 at 10:29 AM Han Zhu ***@***.***> wrote: Yes, the shape is (N, T, C). I think mean(2) and mean(-1) is the same here. I added this smooth_score to the tot_score. So it will be normalized by (len(texts) * accum_grad) together with original tot_score. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#179 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO53A7VJB35ZITXG4TDTLIIZRANCNFSM43WI3RSA> .

zhu-han · 2021-04-30T06:29:09Z

Will try it.
If we print the original mmi loss (i.e. the total loss - weighted smooth loss) like before as the objf, the validation average objf is not affected much. Take the last validation result of training for example：

smooth scale	validation average objf
0	0.204217
0.01	0.199484
0.001	0.217441
0.0001	0.20684
0.00001	0.206871

With smooth scale 0.01, the validation objf is actually better.

As for the loss value, with smooth scale 0.01, the weighted smooth loss is about 30% ~ 150% of the original mmi loss (i.e. the total loss - weighted smooth loss).

danpovey · 2021-04-30T07:33:07Z

So does the validation loss include that extra term, or you disable it for validation?

…

On Fri, Apr 30, 2021 at 2:29 PM Han Zhu ***@***.***> wrote: Will try it. If we print the original mmi loss (i.e. the total loss - weighted smooth loss) like before as the objf, the validation average objf is not affected much. Take the last validation result of training for example： smooth scale validation average objf 0 0.204217 0.01 0.199484 0.001 0.217441 0.0001 0.20684 0.00001 0.206871 With smooth scale 0.01, the validation objf is actually better. As for the loss value, with smooth scale 0.01, the weighted smooth loss is about 30% ~ 150% of the original mmi loss (i.e. the total loss - weighted smooth loss). — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#179 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLOZLCVRGAQK2XFZDGATTLJE4JANCNFSM43WI3RSA> .

zhu-han · 2021-04-30T07:36:18Z

I disabled it for validation.

zhu-han · 2021-05-01T00:48:29Z

There are results with smooth scale 1 and 0.1.

smooth scale	test-clean	test-other
0	6.79	17.8
1	11.42	27.1
0.1	7.63	19.71
0.01	6.8	18.5

The results are clearly worse than the one without smooth loss.

pzelasko · 2021-05-01T01:46:56Z

Since we have an ali model then maybe another option is to add frame wise cross entropy loss using that alignment, and apply the label smoothing there?

danpovey · 2021-05-01T04:05:00Z

We previously tried regularizing with a cross-entropy loss based on the alignments of the currently-being-trained model, but we didn't see any improvements. Could try it again, of course. It's possible that the issue is, we are using relatively small models and the limiting factor is learning, not generalization.

…

On Sat, May 1, 2021 at 9:47 AM Piotr Żelasko ***@***.***> wrote: Since we have an ali model then maybe another option is to add frame wise cross entropy loss using that alignment, and apply the label smoothing there? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#179 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLOYVC55A43RO2O72FCLTLNMR3ANCNFSM43WI3RSA> .

danpovey · 2021-05-01T06:39:23Z

@zhu-han do you think you could read about "iterated loss" here
https://arxiv.org/pdf/2005.09150v2.pdf
and its references, and try to implement something like that?
You could try actually using the LF-MMI loss at earlier layers, e.g randomly choose the layer to avoid excessive computation; and/or use the cross-entropy type of loss that we previously tried (it didn't work) based on the posteriors of the LF-MMI system.

zhu-han · 2021-05-01T10:04:00Z

I'll try it.

danpovey · 2021-05-01T15:28:40Z

thanks!

…

On Sat, May 1, 2021 at 6:04 PM Han Zhu ***@***.***> wrote: I'll try it. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#179 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO6HJSSKP23HWQ5HBHDTLPGZ3ANCNFSM43WI3RSA> .

zhu-han mentioned this issue May 12, 2021

[WIP] Add iterated loss #193

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Label smoothing for LF-MMI #179

Label smoothing for LF-MMI #179

danpovey commented Apr 28, 2021

zhu-han commented Apr 28, 2021

zhu-han commented Apr 30, 2021

csukuangfj commented Apr 30, 2021

zhu-han commented Apr 30, 2021

danpovey commented Apr 30, 2021 via email

zhu-han commented Apr 30, 2021 •

edited

Loading

danpovey commented Apr 30, 2021 via email

zhu-han commented Apr 30, 2021

zhu-han commented May 1, 2021

pzelasko commented May 1, 2021 •

edited

Loading

danpovey commented May 1, 2021 via email

danpovey commented May 1, 2021 •

edited

Loading

zhu-han commented May 1, 2021

danpovey commented May 1, 2021 via email

Label smoothing for LF-MMI #179

Label smoothing for LF-MMI #179

Comments

danpovey commented Apr 28, 2021

zhu-han commented Apr 28, 2021

zhu-han commented Apr 30, 2021

csukuangfj commented Apr 30, 2021

zhu-han commented Apr 30, 2021

danpovey commented Apr 30, 2021 via email

zhu-han commented Apr 30, 2021 • edited Loading

danpovey commented Apr 30, 2021 via email

zhu-han commented Apr 30, 2021

zhu-han commented May 1, 2021

pzelasko commented May 1, 2021 • edited Loading

danpovey commented May 1, 2021 via email

danpovey commented May 1, 2021 • edited Loading

zhu-han commented May 1, 2021

danpovey commented May 1, 2021 via email

zhu-han commented Apr 30, 2021 •

edited

Loading

pzelasko commented May 1, 2021 •

edited

Loading

danpovey commented May 1, 2021 •

edited

Loading