Fix Gemma 7B checkpoint save #1169

ebsmothers · 2024-07-12T16:58:46Z

Fixes #1122. See point (2) here.

Test:

tune run --nnodes 1 --nproc_per_node 4 lora_finetune_distributed --config \
gemma/7B_lora max_steps_per_epoch=5 epochs=1
...
INFO:torchtune.utils.logging:Model checkpoint of size 5.00 GB saved to /tmp/gemma-7b/hf_model_0001_0.pt
INFO:torchtune.utils.logging:Model checkpoint of size 4.98 GB saved to /tmp/gemma-7b/hf_model_0002_0.pt
INFO:torchtune.utils.logging:Model checkpoint of size 4.98 GB saved to /tmp/gemma-7b/hf_model_0003_0.pt
INFO:torchtune.utils.logging:Model checkpoint of size 2.11 GB saved to /tmp/gemma-7b/hf_model_0004_0.pt
INFO:torchtune.utils.logging:Adapter checkpoint of size 0.37 GB saved to /tmp/gemma-7b/adapter_0.pt
INFO:torchtune.utils.logging:Adapter checkpoint of size 0.37 GB saved to /tmp/gemma-7b/adapter_model.bin
INFO:torchtune.utils.logging:Adapter checkpoint of size 0.00 GB saved to /tmp/gemma-7b/adapter_config.json

pytorch-bot · 2024-07-12T16:58:48Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1169

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 7033610 with merge base 1af5135 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

joecummings

show me the receipts

joecummings · 2024-07-12T17:04:02Z

show me the receipts

(Screenshot of successful model + adapter saving)

joecummings · 2024-07-12T17:04:16Z

show me the receipts

(Screenshot of successful model + adapter saving)

Wasn't sure if I was being clear

joecummings · 2024-07-12T17:04:24Z

show me the receipts

(Screenshot of successful model + adapter saving)

Wasn't sure if I was being clear

Does it make sense?

ebsmothers · 2024-07-12T17:07:40Z

show me the receipts

(Screenshot of successful model + adapter saving)

Wasn't sure if I was being clear

Does it make sense?

No

ebsmothers · 2024-07-12T17:07:49Z

show me the receipts

(Screenshot of successful model + adapter saving)

Wasn't sure if I was being clear

Does it make sense?

No

But I updated the test plan

Fix Gemma 7B checkpoint save

7033610

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 12, 2024

ebsmothers requested a review from joecummings July 12, 2024 16:59

joecummings approved these changes Jul 12, 2024

View reviewed changes

joecummings mentioned this pull request Jul 12, 2024

Gemma2B missing lm head weight? #1062

Closed

joecummings merged commit cc92fa0 into pytorch:main Jul 12, 2024
29 checks passed

maximegmd pushed a commit to maximegmd/torchtune that referenced this pull request Jul 13, 2024

Fix Gemma 7B LoRA checkpoint save (pytorch#1169)

ce315ce

pbontrager pushed a commit that referenced this pull request Jul 15, 2024

Fix Gemma 7B LoRA checkpoint save (#1169)

c24f55c

SalmanMohammadi mentioned this pull request Jul 31, 2024

Add support to Qwen2-0.5B and Qwen2-1.5B. #1247

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Gemma 7B checkpoint save #1169

Fix Gemma 7B checkpoint save #1169

ebsmothers commented Jul 12, 2024 •

edited

Loading

pytorch-bot bot commented Jul 12, 2024 •

edited

Loading

joecummings left a comment

joecummings commented Jul 12, 2024

joecummings commented Jul 12, 2024

joecummings commented Jul 12, 2024

ebsmothers commented Jul 12, 2024

ebsmothers commented Jul 12, 2024

Fix Gemma 7B checkpoint save #1169

Fix Gemma 7B checkpoint save #1169

Conversation

ebsmothers commented Jul 12, 2024 • edited Loading

pytorch-bot bot commented Jul 12, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1169

✅ No Failures

joecummings left a comment

Choose a reason for hiding this comment

joecummings commented Jul 12, 2024

joecummings commented Jul 12, 2024

joecummings commented Jul 12, 2024

ebsmothers commented Jul 12, 2024

ebsmothers commented Jul 12, 2024

ebsmothers commented Jul 12, 2024 •

edited

Loading

pytorch-bot bot commented Jul 12, 2024 •

edited

Loading