selective 2d api/example added for fine-grained tp/pp demo #830

moonbucks · 2023-07-04T00:27:49Z

Description

added 2d parallelism (tp+pp) API/example for fine-grained tp/pp

Checklist:

[v] Has code been commented, particularly in hard-to-understand areas?
[v] Have you made corresponding changes to the documentation?

fduwjj

Thanks for making all this happening!!

fduwjj · 2023-07-05T16:15:55Z

examples/selective2d/2d_train.py

+  from pippy.microbatch import TensorChunkSpec, sum_reducer
+
+  pp_dim, tp_dim = 0, 1
+  pp_rank, tp_rank = args.local_rank // args.tp_size, args.local_rank % args.tp_size


What if we are doing this on multiple host?

in multiple host case, we should use args.rank instead of args.local_rank. i'll change the line to handle multiple hosts in the next commit

fduwjj · 2023-07-05T16:18:26Z

examples/selective2d/model.py

+
+    def __init__(self, mesh, config):
+        super().__init__()
+        assert config.n_embd % config.n_head == 0


Let's also add an assert for self.n_head % tp_size == 0?

fduwjj · 2023-07-05T16:21:35Z

examples/selective2d/2d_train.py

+
+  # PP
+  cut_fn(model, args, args.pp_size)
+  stage = compile_stage(model, pp_rank, args.world_size, args.pp_size, args.device, pp_groups, 


For my learning purpose, does it work if we first do TP and then call compile_stage? So DTensor can already be traced by torch.fx?

for now we apply TP first and then PP --> so Dtensor is traced by torch.fx
if we do PP first and TP, the result does not change but PP changes layer name after tracing
(e.g., transformer.block.i.attn --> transformer_block_i_attn) so we should change the name that we pass to TP.

I see. Good to know that DTensor is traceable by torch.fx. I have a n00b question here, what's the difference between first applying TP vs first applying PP?

performance-wise, there is no difference. i think only difference is api. pippy's compile stage breaks higher class (e.g., block/transformer/MLP/Attention) into low-level layers (linear) so we just need to be careful of layer name.

fegin

Can you format the file by using ufmt?

fegin · 2023-07-12T07:24:59Z

examples/selective2d/2d_train.py

+  from pippy.IR import annotate_split_points, PipeSplitWrapper 
+  from pippy import split_into_equal_size
+  from pippy.compile import compile_stage
+  from pippy.microbatch import TensorChunkSpec, sum_reducer


Import should always live in the top of the file.

fegin · 2023-07-12T07:27:33Z

examples/selective2d/2d_train.py

+
+  return model, stage
+
+def even_cut(model, args, pp_size, cut={}):


Should add a one line docstring to describe the function.

fegin · 2023-07-12T07:27:37Z

examples/selective2d/2d_train.py

+
+  annotate_split_points(model, cut)
+
+def after_ar_cut(model, args, pp_size, cut={}):


Should add a one line docstring to describe the function.

fegin · 2023-07-12T07:27:50Z

examples/selective2d/2d_train.py

+
+  annotate_split_points(model, cut)
+
+def pp_and_tp_fg(model, mesh, args, tp_attn_layers=None, tp_mlp_layers=None, cut_fn=even_cut):


The naming is pretty confusing. What is fg?

Should add a one line docstring to describe the function.

fegin · 2023-07-12T07:30:55Z

examples/selective2d/2d_train.py

+
+def pp(model, pp_device_mesh, args):
+  from pippy.IR import annotate_split_points, PipeSplitWrapper 
+  from pippy import split_into_equal_size


Any reason why this is imported but not used?

My bad. I used in my draft and moved on to my own cut. Will remove.

fegin · 2023-07-12T07:32:42Z

examples/selective2d/2d_train.py

+  return model, stage
+
+def even_cut(model, args, pp_size, cut={}):
+  from pippy.IR import annotate_split_points, PipeSplitWrapper 


Move the import to the top of the file.

fegin · 2023-07-12T07:36:06Z

examples/selective2d/2d_train.py

+
+  return model, stage
+
+def even_cut(model, args, pp_size, cut={}):


Why cut is passed as an argument? And using {} as the default value is never good.

I was thinking mixing two algorithms but at this point we may not need it. removed :)

fegin · 2023-07-12T07:44:10Z

examples/selective2d/2d_train.py

+  return local_iter_num, iter_time
+
+def tp_train():
+  local_iter_num = 0


The inconsistent indentation is never good.

## Description added 2d parallelism (tp+pp) API and example for fine-grained tp/pp ## Checklist: - [v] Has code been commented, particularly in hard-to-understand areas? - [v] Have you made corresponding changes to the documentation?

selective 2d api/example added for fine-grained tp/pp demo

4589b24

facebook-github-bot added the cla signed label Jul 4, 2023

moonbucks requested review from fegin, kwen2501, fduwjj and lessw2020 July 4, 2023 00:29

moonbucks self-assigned this Jul 4, 2023

moonbucks added 2 commits July 4, 2023 00:48

cleaned lint-generated warnings

1195b3d

cut with one optimization idea is added: after_ar_cut

38a461d

fduwjj reviewed Jul 5, 2023

View reviewed changes

resolved comments

d77473b

moonbucks force-pushed the tp_pp branch from a5dbe45 to d77473b Compare July 5, 2023 21:09

fegin reviewed Jul 12, 2023

View reviewed changes

resolved comments

1cc1af7

fegin approved these changes Jul 14, 2023

View reviewed changes

moonbucks merged commit a6f9997 into pytorch:main Jul 14, 2023
5 of 25 checks passed

moonbucks deleted the tp_pp branch July 14, 2023 19:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

selective 2d api/example added for fine-grained tp/pp demo #830

selective 2d api/example added for fine-grained tp/pp demo #830

moonbucks commented Jul 4, 2023

fduwjj left a comment

fduwjj Jul 5, 2023

moonbucks Jul 5, 2023

fduwjj Jul 5, 2023

moonbucks Jul 5, 2023

fduwjj Jul 5, 2023

moonbucks Jul 5, 2023

fduwjj Jul 6, 2023

moonbucks Jul 7, 2023

fegin left a comment

fegin Jul 12, 2023

fegin Jul 12, 2023

fegin Jul 12, 2023

fegin Jul 12, 2023

fegin Jul 12, 2023

moonbucks Jul 13, 2023

fegin Jul 12, 2023

moonbucks Jul 13, 2023

fegin Jul 12, 2023

fegin Jul 12, 2023

moonbucks Jul 13, 2023

fegin Jul 12, 2023


		return model, stage

		def even_cut(model, args, pp_size, cut={}):


		annotate_split_points(model, cut)

		def after_ar_cut(model, args, pp_size, cut={}):


		annotate_split_points(model, cut)

		def pp_and_tp_fg(model, mesh, args, tp_attn_layers=None, tp_mlp_layers=None, cut_fn=even_cut):

selective 2d api/example added for fine-grained tp/pp demo #830

selective 2d api/example added for fine-grained tp/pp demo #830

Conversation

moonbucks commented Jul 4, 2023

Description

Checklist:

fduwjj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fegin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment