-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FP8Linear saves new parameters in ckpt and I cannot load the saved ckpt #651
Comments
Hi @goldhuang , could I have the training config for delayed scaling? I have not promoted delayed scaling but want to take this chance to make it right |
I tried to use DYNAMIC at the very beginning, but found an issue with HSDP. I created an issue in torchao repo pytorch/ao#1086. Then I changed to use DELAYED. This issue is only happening with BTW, I find |
for 128 GPUs, thanks for sharing your recipe. I am giving it a try |
@weifengpy Did you guys try with more than 128 GPUs? Like 1024 GPUs? |
for 1D FSDP, 128 GPUs are my largest test. I have not tested on 1D FSDP on 1024 GPUs. At that scale probably need HSDP |
I'm using
to load the distributed ckpt.
The text was updated successfully, but these errors were encountered: