Accelerated training with floating point fp16 #3

milliema · 2021-01-10T06:05:03Z

Thanks for the work!
I'd like to know if the optimizer SAM is also applicable to accelerated training, i.e. using automatic mixed precision like fp16. I tried to adopt SAM in my own training codes with fp16 on Pytorch, but Nan loss happens and the computed grad norm is Nan. Regular training using SGD gives no error. So I'm wondering if it is caused by some error in the Pytorch reimplementation or is it due to the limitation of SAM?

shuo-ouyang · 2021-08-02T07:19:05Z

Maybe we should add a small number (such as 1e-9) to avoid divided by zero when computing errors.

celidos · 2022-08-19T19:07:40Z

Interested in this topic too

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerated training with floating point fp16 #3

Accelerated training with floating point fp16 #3

milliema commented Jan 10, 2021

shuo-ouyang commented Aug 2, 2021

celidos commented Aug 19, 2022

Accelerated training with floating point fp16 #3

Accelerated training with floating point fp16 #3

Comments

milliema commented Jan 10, 2021

shuo-ouyang commented Aug 2, 2021

celidos commented Aug 19, 2022