Miscellaneous CI, dependency, and version fixes #1151

ebsmothers · 2024-07-09T01:04:54Z

Consolidating a bunch of small changes into this PR:

run unit tests on GPUs in our CI
define a util for ao version checks (this will come in handy once Make version check equal to or more than instead of more than ao#485 lands)
remove torchvision from pyproject.toml and add separate install instructions to fix this issue with forced downgrade of PyTorch nightlies
run CI on the combo of ao + PyTorch nightlies
- Another thing we could consider is to remove ao from our pyproject.toml, similar to what I've added here for torchvision, then give the separate install instructions depending on whether users want nightly or stable versions. Otherwise it's a bit hacky because we technically install both stable and nightly versions of ao in our install flow (but nightly will supersede stable). Punting this for now since I don't think it'll break anything

CI should be green

pytorch-bot · 2024-07-09T01:04:57Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1151

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9b94fa1 with merge base 26b54b2 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

codecov-commenter · 2024-07-09T05:27:55Z

Codecov Report

Attention: Patch coverage is 72.97297% with 10 lines in your changes missing coverage. Please review.

Project coverage is 68.01%. Comparing base (06a125e) to head (189c409).
Report is 2 commits behind head on main.

Files	Patch %	Lines
torchtune/modules/low_precision/_utils.py	64.00%	9 Missing ⚠️
torchtune/utils/quantization.py	83.33%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1151       +/-   ##
===========================================
+ Coverage   26.76%   68.01%   +41.25%     
===========================================
  Files         205      213        +8     
  Lines        9301     9633      +332     
===========================================
+ Hits         2489     6552     +4063     
+ Misses       6812     3081     -3731

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

rohan-varma · 2024-07-09T07:09:48Z

torchtune/utils/quantization.py

+from torchtune.modules.low_precision._utils import _get_torchao_version
+
+ao_version, is_nightly = _get_torchao_version()
+print(ao_version, is_nightly)


nit: remove this before landing?

felipemello1 · 2024-07-09T15:22:28Z

pyproject.toml

@@ -10,8 +10,6 @@ authors = [
 ]
 keywords = ["pytorch", "finetuning", "llm"]
 dependencies = [
-    # multimodality
-    "torchvision",


I wonder if there is a way to keep torchvision without causing issues with torch nightlies. Do you know if its worth researching, or there is no way to make it work?

Just chatted with @NicolasHug on this and he confirmed it's not possible since there's no way to point pyproject.toml to a specific conda channel or PyPI repo

got it! i am just afraid that that in the long run it may cause issues, since we may want to pin torchvision version. For example, in ClipTransforms, older versions will break.

It should be installed in a single command, this is why I updated the readme to clarify this. So if the user runs pip install torch torchvision they will get stable versions of both torch and torchvision; if they run pip3 install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu121 they will get torch nightly and torchvision 0.20.

I can't find the link to the original comment but I believe that was demonstrating what happens when we install using the existing pyproject.toml. For reference, here is my pip list after running the first command; here is my pip list after running the second command. You can see that the versions are as expected.

Btw regarding pinning versions -- we do not test on anything older than the latest stable version of PyTorch and I don't think we want to worry about breaking folks on older versions than that. By the same logic, I don't think we should be pinning to older versions of torchvision. Then the best way to keep things in sync is just install the two together using these commands.

thanks for the links and explanation!

winglian · 2024-07-09T16:04:34Z

tests/torchtune/utils/test_distributed.py

@@ -262,6 +262,10 @@ def world_size(self) -> int:
        return 2

    @gpu_test(gpu_count=2)
+    @pytest.mark.skipif(
+        version.parse(torch.__version__).base_version < "2.4.0",
+        reason="torch >= 2.4 required",


Any particular reason we don't want to test this with torch 2.3.x?

Some of the DTensor APIs we use in load_from_full_model_state_dict were not stable prior to 2.4. We already addressed this for the QLoRA state dict test in #1087. In this case it's OK because we are testing FSDP2 functionality which is not available until 2.4 anyways. cc @weifengpy in case I'm missing any important points here.

joecummings · 2024-07-09T18:14:20Z

.github/workflows/gpu_test.yaml

@@ -1,4 +1,4 @@
-name: Multi-GPU Recipe Tests


Why the rename here?

It's no longer just a recipe test, right? Now it's recipe + unit test

joecummings · 2024-07-09T18:14:33Z

.github/workflows/recipe_test_nightly.yaml

@@ -4,6 +4,7 @@ on:
  schedule:
    # Runs at midnight every day
    - cron:  '0 0 * * *'
+  workflow_dispatch:


joecummings · 2024-07-09T18:15:15Z

README.md

+
+```
+# Install stable version of PyTorch using pip
+pip3 install torch torchvision


I don't think we need to explicitly say pip3 anymore

kartikayk

Did we align on AO being an optional dependency? If so, why not do what we do with BnB example and ask users to manually install?

kartikayk · 2024-07-09T18:27:36Z

README.md

@@ -156,7 +156,16 @@ You can find a full list of all our Llama3 configs [here.](recipes/configs/llama

 ## Installation

-**Step 1:** [Install PyTorch](https://pytorch.org/get-started/locally/). torchtune is tested with the latest stable PyTorch release as well as the preview nightly version.
+**Step 1:** [Install PyTorch](https://pytorch.org/get-started/locally/). torchtune is tested with the latest stable PyTorch release as well as the preview nightly version. For multimodality


Suggested change

**Step 1:** [Install PyTorch](https://pytorch.org/get-started/locally/). torchtune is tested with the latest stable PyTorch release as well as the preview nightly version. For multimodality

**Step 1:** [Install PyTorch](https://pytorch.org/get-started/locally/). torchtune is tested with the latest stable PyTorch release as well as the preview nightly version. For fine-tuning the multimodal LLMs available in the repo, you'll need to install torchvision as well

ebsmothers · 2024-07-09T18:42:15Z

Did we align on AO being an optional dependency? If so, why not do what we do with BnB example and ask users to manually install?

@kartikayk I think we aligned on actually testing ao nightlies but not on having it as an optional dependency. So I'm doing the former here and not the latter. But cc @msaroufim @joecummings if either of you have thoughts on this

ebsmothers added 2 commits July 8, 2024 18:01

[wip] CI: drop 3.8, run on ao nightly, better ao version checks

f5f911b

run unit tests on GPUs in CI

4c4de97

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 9, 2024

ebsmothers added 3 commits July 8, 2024 21:24

cleanup

e85fe96

yaml fixes

f45bfbd

add pytest skip for nightly DTensor API

5746924

ebsmothers changed the title ~~[wip] CI: drop 3.8, run on ao nightly, better ao version checks~~ Miscellaneous CI, dependency, and version fixes Jul 9, 2024

ebsmothers added 2 commits July 8, 2024 22:21

add back python 3.8

9d0e193

remove extra file

189c409

ebsmothers requested review from joecummings and felipemello1 July 9, 2024 05:26

add torchvision to doc build job

31c02ca

ebsmothers marked this pull request as ready for review July 9, 2024 05:29

rohan-varma reviewed Jul 9, 2024

View reviewed changes

felipemello1 reviewed Jul 9, 2024

View reviewed changes

winglian reviewed Jul 9, 2024

View reviewed changes

remove print statement

a0c9789

joecummings reviewed Jul 9, 2024

View reviewed changes

joecummings approved these changes Jul 9, 2024

View reviewed changes

kartikayk reviewed Jul 9, 2024

View reviewed changes

address comments on readme

9b94fa1

ebsmothers merged commit 37636a8 into pytorch:main Jul 9, 2024
29 checks passed

ebsmothers deleted the ao-updates branch July 9, 2024 19:22

maximegmd pushed a commit to maximegmd/torchtune that referenced this pull request Jul 13, 2024

Miscellaneous CI, dependency, and version fixes (pytorch#1151)

9710932

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Miscellaneous CI, dependency, and version fixes #1151

Miscellaneous CI, dependency, and version fixes #1151

ebsmothers commented Jul 9, 2024 •

edited

Loading

pytorch-bot bot commented Jul 9, 2024 •

edited

Loading

codecov-commenter commented Jul 9, 2024

rohan-varma Jul 9, 2024

felipemello1 Jul 9, 2024 •

edited

Loading

ebsmothers Jul 9, 2024

felipemello1 Jul 9, 2024 •

edited

Loading

ebsmothers Jul 9, 2024

felipemello1 Jul 9, 2024

winglian Jul 9, 2024

ebsmothers Jul 9, 2024

joecummings Jul 9, 2024

ebsmothers Jul 9, 2024

joecummings Jul 9, 2024

joecummings Jul 9, 2024

kartikayk left a comment

kartikayk Jul 9, 2024

ebsmothers commented Jul 9, 2024

	Step 1: [Install PyTorch](https://pytorch.org/get-started/locally/). torchtune is tested with the latest stable PyTorch release as well as the preview nightly version. For multimodality
	Step 1: [Install PyTorch](https://pytorch.org/get-started/locally/). torchtune is tested with the latest stable PyTorch release as well as the preview nightly version. For fine-tuning the multimodal LLMs available in the repo, you'll need to install torchvision as well

Miscellaneous CI, dependency, and version fixes #1151

Miscellaneous CI, dependency, and version fixes #1151

Conversation

ebsmothers commented Jul 9, 2024 • edited Loading

pytorch-bot bot commented Jul 9, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1151

✅ No Failures

codecov-commenter commented Jul 9, 2024

Codecov Report

Choose a reason for hiding this comment

felipemello1 Jul 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

felipemello1 Jul 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kartikayk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ebsmothers commented Jul 9, 2024

ebsmothers commented Jul 9, 2024 •

edited

Loading

pytorch-bot bot commented Jul 9, 2024 •

edited

Loading

felipemello1 Jul 9, 2024 •

edited

Loading

felipemello1 Jul 9, 2024 •

edited

Loading