[bug] DoRA is broken #1903

ebsmothers · 2024-10-25T00:00:56Z

Two separate DoRA bugs I just noticed:

(1) Llama 3.2 1B config with DoRA errors on state dict load. Repro:

tune run lora_finetune_single_device --config llama3_2/1B_lora_single_device \
gradient_accumulation_steps=1 max_steps_per_epoch=5 model.use_dora=True
...
Exception: Error converting the state dict. Found unexpected key: "layers.0.attn.q_proj.magnitude". Please make sure you're loading a checkpoint with the right format.

(2) Llama 3.2 Vision 11B model with DoRA has NaN loss. Repro:

tune run lora_finetune_single_device --config llama3_2_vision/11B_lora_single_device \
max_steps_per_epoch=5 gradient_accumulation_steps=1 model.use_dora=True

Once we fix them we should add recipe test cases setting model.use_dora=True to catch these errors in the future, cc @felipemello1.

The text was updated successfully, but these errors were encountered:

ebsmothers added the bug Something isn't working label Oct 25, 2024

SalmanMohammadi mentioned this issue Oct 25, 2024

Testing tracker #1890

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug] DoRA is broken #1903

[bug] DoRA is broken #1903

ebsmothers commented Oct 25, 2024

[bug] DoRA is broken #1903

[bug] DoRA is broken #1903

Comments

ebsmothers commented Oct 25, 2024