New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[WIP] Llama 3.2 Vision - 90B #1880

Draft

felipemello1 wants to merge 3 commits into pytorch:main from felipemello1:90b_llamav

Contributor

felipemello1 commented Oct 22, 2024

Context

What is the purpose of this PR? Is it to

add a new feature
fix a bug
update tests and/or documentation
other (please add here)

Please link to any issues this PR addresses.

Changelog

What are the changes made in this PR?
*

Test plan

Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.

run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
add unit tests for any new functionality
update docstrings for any new or updated methods or classes
run unit tests via pytest tests
run recipe tests via pytest tests -m integration_test
manually run any new or modified recipes with sufficient proof of correctness
include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)

UX

If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example

I did not change any public API
I have added an example to docs or docstrings


          first commit

f335bf3

pytorch-bot bot commented Oct 22, 2024 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1880

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot added the CLA Signed label

felipemello1 commented

View reviewed changes

torchtune/models/llama3_2_vision/_model_builders.py

		from torchtune.modules.tokenizers import parse_hf_tokenizer_json


		def llama3_2_vision_transform(

Contributor Author

felipemello1 Oct 24, 2024

no changes here, just put at the top

torchtune/models/llama3_2_vision/_model_builders.py Outdated

-                  image_size: int = 560
-                  ) -> DeepFusionModel:
-                  """ Llama 3.2 Vision 11B model
+                  image_size: int = 560,

Contributor Author

felipemello1 Oct 24, 2024

no changes, just pre commit hook

torchtune/models/llama3_2_vision/_model_builders.py

-              def llama3_2_vision_transform(
-                      path: str, max_seq_len: int = 8192, image_size: int = 560, special_tokens_path: Optional[str] = None, prompt_template: Optional[_TemplateType] = None
-                  ) -> Llama3VisionTransform:
+              def lora_llama3_2_vision_11b(

Contributor Author

felipemello1 Oct 24, 2024

no changes, just reordering of functions. It thinks that i am rewriting it. llama3_2_vision_transform is at the top

torchtune/models/llama3_2_vision/_model_builders.py

		)


		def llama3_2_vision_90b(

Contributor Author

felipemello1 Oct 24, 2024

copied from 11b. updated docstring + a couple of parameters.

11b is this:

decoder = llama3_2_vision_decoder(
        vocab_size=128_256,
        num_layers=32,
        fusion_interval=4,
        num_special_tokens=8,
        num_heads=32,
        num_kv_heads=8,
        embed_dim=4096,
        max_seq_len=131_072,
        encoder_max_seq_len=128_080,  # 20*6404
        rope_base=500000.0,
        intermediate_dim=14336,
    )

90b is this:

decoder = llama3_2_vision_decoder(
        vocab_size=128_256,
        num_layers=100,
        fusion_interval=4,
        num_special_tokens=8,
        num_heads=64,
        num_kv_heads=8,
        embed_dim=8192,
        max_seq_len=131_072,
        encoder_max_seq_len=128_080,  # 20*6404
        rope_base=500000.0,
        intermediate_dim=28672,
    )

encoder is the same, except for decoder_embed_dim, which is 8192 instead of 4096.

Values taken from here: https://huggingface.co/meta-llama/Llama-3.2-90B-Vision/blob/main/config.json

torchtune/models/llama3_2_vision/_model_builders.py

                   )
-              def lora_llama3_2_vision_11b(
+              def lora_llama3_2_vision_90b(

Contributor Author

felipemello1 Oct 24, 2024

git is confused :(

lora_llama3_2_vision_11b is still at the top and was not replaced.

this function is a copy of lora_llama3_2_vision_11b

Felipe Mello added 2 commits

October 24, 2024 14:14


          Merge branch 'main' into 90b_llamav

ff4541d


          it works

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels