-
Notifications
You must be signed in to change notification settings - Fork 411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Llama 3.2 Vision - 90B #1880
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1880
Note: Links to docs will display an error until the docs builds have been completed. This comment was automatically generated by Dr. CI and updates every 15 minutes. |
from torchtune.modules.tokenizers import parse_hf_tokenizer_json | ||
|
||
|
||
def llama3_2_vision_transform( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no changes here, just put at the top
image_size: int = 560 | ||
) -> DeepFusionModel: | ||
""" Llama 3.2 Vision 11B model | ||
image_size: int = 560, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no changes, just pre commit hook
def llama3_2_vision_transform( | ||
path: str, max_seq_len: int = 8192, image_size: int = 560, special_tokens_path: Optional[str] = None, prompt_template: Optional[_TemplateType] = None | ||
) -> Llama3VisionTransform: | ||
def lora_llama3_2_vision_11b( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no changes, just reordering of functions. It thinks that i am rewriting it. llama3_2_vision_transform is at the top
) | ||
|
||
|
||
def llama3_2_vision_90b( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copied from 11b. updated docstring + a couple of parameters.
11b is this:
decoder = llama3_2_vision_decoder(
vocab_size=128_256,
num_layers=32,
fusion_interval=4,
num_special_tokens=8,
num_heads=32,
num_kv_heads=8,
embed_dim=4096,
max_seq_len=131_072,
encoder_max_seq_len=128_080, # 20*6404
rope_base=500000.0,
intermediate_dim=14336,
)
90b is this:
decoder = llama3_2_vision_decoder(
vocab_size=128_256,
num_layers=100,
fusion_interval=4,
num_special_tokens=8,
num_heads=64,
num_kv_heads=8,
embed_dim=8192,
max_seq_len=131_072,
encoder_max_seq_len=128_080, # 20*6404
rope_base=500000.0,
intermediate_dim=28672,
)
encoder is the same, except for decoder_embed_dim, which is 8192 instead of 4096.
Values taken from here: https://huggingface.co/meta-llama/Llama-3.2-90B-Vision/blob/main/config.json
) | ||
|
||
|
||
def lora_llama3_2_vision_11b( | ||
def lora_llama3_2_vision_90b( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
git is confused :(
lora_llama3_2_vision_11b is still at the top and was not replaced.
this function is a copy of lora_llama3_2_vision_11b
Context
What is the purpose of this PR? Is it to
Please link to any issues this PR addresses.
Changelog
What are the changes made in this PR?
*
Test plan
Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.
pre-commit install
)pytest tests
pytest tests -m integration_test
UX
If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example