Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 1F1B schedule #844

Merged
merged 2 commits into from
Jul 31, 2023
Merged

Add 1F1B schedule #844

merged 2 commits into from
Jul 31, 2023

Conversation

kwen2501
Copy link
Contributor

@kwen2501 kwen2501 commented Jul 23, 2023

Description

c10d version 1F1B schedule.
We use nstages chunks to warm up the pipeline, then we enter a 1F1B stable phase, and lastly use nstages to cool down the pipeline.
Ref: https://arxiv.org/pdf/2104.04473.pdf.

API

stage = compile_stage(
    model,
    ... ,
    schedule="1F1B",
)

Implementation Details

To avoid writing duplicated code, we modularize the original code in FillDrain implementation into forward_one_chunk and backward_one_chunk. Then the two different schedules can share these modular code and just focus on the schedule itself.

Test

Added "schedule" option to test:

$ torchrun --nproc-per-node 4 local_test_c10d_bwd.py --schedule=1F1B --chunks=16

@kwen2501 kwen2501 requested a review from fegin July 24, 2023 13:16
@kwen2501 kwen2501 merged commit a1ee78d into main Jul 31, 2023
21 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants