Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make new optimizer more extensible, easier to integrate downstream for FSDP #181

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

muellerzr
Copy link

@muellerzr muellerzr commented Aug 15, 2024

Description
This PR makes it easier for users to use FSDP with MS-AMP from their existing optimizers. This is especially beneficial for library authors, as currently we need to go through quite a bit to get the FSDP version of these optimizers working when a user passes in optim.Adam.

Instead we delegate the FSDPAdamW to an OptimWrapper, which calls an underlying optimizer as a passthrough. This lets us add in any logic that should be done before/after said logic easier, and it takes in a constructed Optimizer rather than being inherited.

Let me know what we think about this, currently I'm going through integrating FSDP and DeepSpeed w/ MS-AMP into Accelerate and found this to be a critical painpoint, as our users pass in normal PyTorch optimizers and don't create special versions themselves.

@tocean @wkcn let me know what you two think :)

New working FSDP:

model, optimizer = ...
model, optimizer = msamp.initialize(model, optimizer, use_fsdp=True, weight_qtype=Dtypes.kfloat8_e4m3)
model = FP8FullyShardedDataParallel(model, use_orig_params=True, auto_wrap_policy=my_auto_wrap_policy)
optimizer = FSDPAdamW(optimizer)

@muellerzr
Copy link
Author

@microsoft-github-policy-service agree company="Hugging Face"

@muellerzr muellerzr marked this pull request as draft August 15, 2024 15:41
@muellerzr
Copy link
Author

muellerzr commented Aug 15, 2024

Sorry for the extraneous pushes while I was figuring something out. Good to go now :)

@muellerzr muellerzr marked this pull request as ready for review August 15, 2024 16:23
@muellerzr
Copy link
Author

You can see our new accelerate benchmarking scripts here: https://github.com/huggingface/accelerate/tree/muellerzr-msamp-ds-fsdp/benchmarks/fp8/ms_amp

@muellerzr
Copy link
Author

@tocean @wkcn any particular issues with this? :)

(Ideally it'd be great to include this in the next accelerate release on the 1st :) )

@wkcn
Copy link
Contributor

wkcn commented Aug 22, 2024

@muellerzr Thanks for your contribution!

The PR looks good to me.
Sorry that I am not at Microsoft and do not have the authorization to review and merge the pull request.

@muellerzr
Copy link
Author

muellerzr commented Aug 22, 2024

Ack okay, I suppose we'll have to wait for @tocean /@abuccts /@guoshzhao to take a look. Thanks for the flag 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants