Skip to content
This repository has been archived by the owner on Oct 13, 2022. It is now read-only.

Conformer & convergence #169

Open
danpovey opened this issue Apr 21, 2021 · 0 comments
Open

Conformer & convergence #169

danpovey opened this issue Apr 21, 2021 · 0 comments

Comments

@danpovey
Copy link
Contributor

danpovey commented Apr 21, 2021

Got this via email from @zhu-han ...

I got the first reasonable result:
2021-04-21 02:28:00,831 INFO [common.py:365] [test-clean] %WER 13.39% [7041 / 52576, 887 ins, 965 del, 5189 sub ]
2021-04-21 02:29:42,636 INFO [common.py:365] [test-other] %WER 35.57% [18619 / 52343, 1327 ins, 3534 del, 13758 sub ]
The training log is  in the attachment. 
And code of this version is in https://github.com/zhu-han/snowfall/commit/4d4a0c42c175571e396736c757ceb6698afc9b18 
The differences with the original version in the paper are:
# this version vs original version
1) 4× subsampling vs 8× subsampling; 
2) kernel size 3 vs kernel size 5;
The model could not converge without these two changes. 
But the performance gap between this and conformer is still large. 
Do you have some advice on that?

For convergence problems: we are working on a way to make tuings converge much easier.. @csukuangfj was going to commit it. It involves using a simpler model as an alignment model, adding its output with the model being trained with a scale less than one, near the start of training.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant