-
Notifications
You must be signed in to change notification settings - Fork 411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Miscellaneous CI, dependency, and version fixes #1151
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1151
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 9b94fa1 with merge base 26b54b2 (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1151 +/- ##
===========================================
+ Coverage 26.76% 68.01% +41.25%
===========================================
Files 205 213 +8
Lines 9301 9633 +332
===========================================
+ Hits 2489 6552 +4063
+ Misses 6812 3081 -3731 ☔ View full report in Codecov by Sentry. |
torchtune/utils/quantization.py
Outdated
from torchtune.modules.low_precision._utils import _get_torchao_version | ||
|
||
ao_version, is_nightly = _get_torchao_version() | ||
print(ao_version, is_nightly) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove this before landing?
@@ -10,8 +10,6 @@ authors = [ | |||
] | |||
keywords = ["pytorch", "finetuning", "llm"] | |||
dependencies = [ | |||
# multimodality | |||
"torchvision", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there is a way to keep torchvision without causing issues with torch nightlies. Do you know if its worth researching, or there is no way to make it work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just chatted with @NicolasHug on this and he confirmed it's not possible since there's no way to point pyproject.toml to a specific conda channel or PyPI repo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it! i am just afraid that that in the long run it may cause issues, since we may want to pin torchvision version. For example, in ClipTransforms, older versions will break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be installed in a single command, this is why I updated the readme to clarify this. So if the user runs pip install torch torchvision
they will get stable versions of both torch and torchvision; if they run pip3 install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu121
they will get torch nightly and torchvision 0.20.
I can't find the link to the original comment but I believe that was demonstrating what happens when we install using the existing pyproject.toml. For reference, here is my pip list after running the first command; here is my pip list after running the second command. You can see that the versions are as expected.
Btw regarding pinning versions -- we do not test on anything older than the latest stable version of PyTorch and I don't think we want to worry about breaking folks on older versions than that. By the same logic, I don't think we should be pinning to older versions of torchvision. Then the best way to keep things in sync is just install the two together using these commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the links and explanation!
@@ -262,6 +262,10 @@ def world_size(self) -> int: | |||
return 2 | |||
|
|||
@gpu_test(gpu_count=2) | |||
@pytest.mark.skipif( | |||
version.parse(torch.__version__).base_version < "2.4.0", | |||
reason="torch >= 2.4 required", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason we don't want to test this with torch 2.3.x?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the DTensor APIs we use in load_from_full_model_state_dict were not stable prior to 2.4. We already addressed this for the QLoRA state dict test in #1087. In this case it's OK because we are testing FSDP2 functionality which is not available until 2.4 anyways. cc @weifengpy in case I'm missing any important points here.
@@ -1,4 +1,4 @@ | |||
name: Multi-GPU Recipe Tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the rename here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's no longer just a recipe test, right? Now it's recipe + unit test
@@ -4,6 +4,7 @@ on: | |||
schedule: | |||
# Runs at midnight every day | |||
- cron: '0 0 * * *' | |||
workflow_dispatch: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yay
README.md
Outdated
|
||
``` | ||
# Install stable version of PyTorch using pip | ||
pip3 install torch torchvision |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to explicitly say pip3
anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we align on AO being an optional dependency? If so, why not do what we do with BnB example and ask users to manually install?
README.md
Outdated
@@ -156,7 +156,16 @@ You can find a full list of all our Llama3 configs [here.](recipes/configs/llama | |||
|
|||
## Installation | |||
|
|||
**Step 1:** [Install PyTorch](https://pytorch.org/get-started/locally/). torchtune is tested with the latest stable PyTorch release as well as the preview nightly version. | |||
**Step 1:** [Install PyTorch](https://pytorch.org/get-started/locally/). torchtune is tested with the latest stable PyTorch release as well as the preview nightly version. For multimodality |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**Step 1:** [Install PyTorch](https://pytorch.org/get-started/locally/). torchtune is tested with the latest stable PyTorch release as well as the preview nightly version. For multimodality | |
**Step 1:** [Install PyTorch](https://pytorch.org/get-started/locally/). torchtune is tested with the latest stable PyTorch release as well as the preview nightly version. For fine-tuning the multimodal LLMs available in the repo, you'll need to install torchvision as well |
@kartikayk I think we aligned on actually testing ao nightlies but not on having it as an optional dependency. So I'm doing the former here and not the latter. But cc @msaroufim @joecummings if either of you have thoughts on this |
Consolidating a bunch of small changes into this PR:
CI should be green