Support Gemma2 in torchtitan #594

pansershrek · 2024-10-01T11:50:15Z

Are there any plans to support Gemma2 in the torchtitan? I tried to use torchtitan to finetune Gemma2 model, but stuck on the following problem: how to parallelize tied layer in Gemma2 model? Maybe somebody kwon the solution for this problem 😄

awgu · 2024-10-01T14:41:54Z

If you apply fully_shard to each transformer block and then to the root module, this should work for tied embedding and final linear. The root module will manage both.

pansershrek · 2024-10-01T14:47:51Z

I want to shard output embedding layer - I use same strategy as in Llama, but training stacked after first butch
ColwiseParallel( input_layouts=Shard(1), output_layouts=Shard(-1) if loss_parallel else Replicate(), use_local_output=not loss_parallel, )

awgu · 2024-10-01T15:03:32Z

Do you want to train with 2D parallelism (FSDP + TP)? With TP only?

yf225 added the enhancement New feature or request label Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Gemma2 in torchtitan #594

Support Gemma2 in torchtitan #594

pansershrek commented Oct 1, 2024

awgu commented Oct 1, 2024

pansershrek commented Oct 1, 2024

awgu commented Oct 1, 2024

Support Gemma2 in torchtitan #594

Support Gemma2 in torchtitan #594

Comments

pansershrek commented Oct 1, 2024

awgu commented Oct 1, 2024

pansershrek commented Oct 1, 2024

awgu commented Oct 1, 2024