Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscellaneous CI, dependency, and version fixes #1151

Merged
merged 10 commits into from
Jul 9, 2024
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions .github/workflows/build_docs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ jobs:
run: python -m pip install --upgrade pip
- name: Install dependencies
run: |
python -m pip install torch
python -m pip install torch torchvision
python -m pip install -e .
cd docs
python -m pip install -r requirements.txt
Expand Down Expand Up @@ -108,21 +108,21 @@ jobs:
run: |
git remote set-url origin https://pytorchbot:${GITHUB_PYTORCHBOT_TOKEN}@github.com/pytorch/torchtune.git
set -euo pipefail
# Convert refs/tags/v1.12.0rc3 into 1.12.

# Convert refs/tags/v1.12.0rc3 into 1.12.
# Adopted from https://github.com/pytorch/pytorch/blob/main/.github/workflows/_docs.yml#L150C11-L155C13
GITHUB_REF=${{ github.ref }}
GITHUB_REF=${{ github.ref }}
if [[ "${GITHUB_REF}" =~ ^refs/tags/v([0-9]+\.[0-9]+)\.* ]]; then
TARGET_FOLDER="${BASH_REMATCH[1]}"
else
TARGET_FOLDER="main"
fi

echo "Target Folder: ${TARGET_FOLDER}"
mkdir -p "${TARGET_FOLDER}"
rm -rf "${TARGET_FOLDER}"/*
mv docs/* "${TARGET_FOLDER}"

git config user.name 'pytorchbot'
git config user.email 'soumith+bot@pytorch.org'
git add "${TARGET_FOLDER}" || true
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Multi-GPU Recipe Tests
name: GPU tests

on:
push:
Expand All @@ -7,7 +7,7 @@ on:
workflow_dispatch:

concurrency:
group: recipe-test-multi-gpu-${{ github.workflow }}-${{ github.ref == 'refs/heads/main' && github.run_number || github.ref }}
group: gpu-test-${{ github.workflow }}-${{ github.ref == 'refs/heads/main' && github.run_number || github.ref }}
cancel-in-progress: true

permissions:
Expand All @@ -19,7 +19,7 @@ defaults:
shell: bash -l -eo pipefail {0}

jobs:
recipe_test_multi_gpu:
gpu_test:
runs-on: linux.8xlarge.nvidia.gpu
strategy:
matrix:
Expand All @@ -39,15 +39,15 @@ jobs:
run: python -m pip install --upgrade pip
- name: Install torch nightly
if: ${{ matrix.torch-version == 'nightly' }}
run: python -m pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu118
run: python -m pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu118
- name: Install torch stable
if: ${{ matrix.torch-version == 'stable' }}
run: python -m pip install torch
run: python -m pip install torch torchvision
- name: Install remaining dependencies
run: |
python -m pip install -e ".[dev]"
python -m pip install lm-eval==0.4.*
- name: Run recipe tests with coverage
run: pytest tests -m integration_test --cov=. --cov-report=xml --durations=20 -vv
- name: Run recipe and unit tests with coverage
run: pytest tests --with-integration --cov=. --cov-report=xml --durations=20 -vv
- name: Upload Coverage to Codecov
uses: codecov/codecov-action@v3
2 changes: 1 addition & 1 deletion .github/workflows/recipe_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ jobs:
run: python -m pip install --upgrade pip
- name: Install dependencies
run: |
python -m pip install torch
python -m pip install torch torchvision
python -m pip install -e ".[dev]"
python -m pip install lm-eval==0.4.*
- name: Run recipe tests with coverage
Expand Down
8 changes: 6 additions & 2 deletions .github/workflows/recipe_test_nightly.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ on:
schedule:
# Runs at midnight every day
- cron: '0 0 * * *'
workflow_dispatch:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yay


concurrency:
group: recipe-test-nightly-${{ github.workflow }}-${{ github.ref == 'refs/heads/main' && github.run_number || github.ref }}
Expand Down Expand Up @@ -38,14 +39,17 @@ jobs:
run: python -m pip install --upgrade pip
- name: Install torch nightly
if: ${{ matrix.torch-version == 'nightly' }}
run: python -m pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu118
run: python -m pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu121
- name: Install torch stable
if: ${{ matrix.torch-version == 'stable' }}
run: python -m pip install torch
run: python -m pip install torch torchvision
- name: Install remaining dependencies
run: |
python -m pip install -e ".[dev]"
python -m pip install lm-eval==0.4.*
- name: Install torchao nightly
if: ${{ matrix.torch-version == 'nightly' }}
run: pip install --pre torchao-nightly --index-url https://download.pytorch.org/whl/nightly/cu121
- name: Run recipe tests with coverage
run: pytest tests -m integration_test --cov=. --cov-report=xml --durations=20 -vv
- name: Upload Coverage to Codecov
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/regression_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -47,10 +47,10 @@ jobs:
python3 -m pip install awscli==1.32.6
- name: Install torch nightly
if: ${{ matrix.torch-version == 'nightly' }}
run: python -m pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu118
run: python -m pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu118
- name: Install torch stable
if: ${{ matrix.torch-version == 'stable' }}
run: python -m pip install torch
run: python -m pip install torch torchvision
- name: Install remaining dependencies
run: |
python -m pip install -e ".[dev]"
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/unit_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ jobs:
run: python -m pip install --upgrade pip
- name: Install dependencies
run: |
python -m pip install torch
python -m pip install torch torchvision
python -m pip install -e ".[dev]"
- name: Run unit tests with coverage
run: pytest tests --cov=. --cov-report=xml --durations=20 -vv
Expand Down
11 changes: 10 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,16 @@ You can find a full list of all our Llama3 configs [here.](recipes/configs/llama

## Installation

**Step 1:** [Install PyTorch](https://pytorch.org/get-started/locally/). torchtune is tested with the latest stable PyTorch release as well as the preview nightly version.
**Step 1:** [Install PyTorch](https://pytorch.org/get-started/locally/). torchtune is tested with the latest stable PyTorch release as well as the preview nightly version. For multimodality
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Step 1:** [Install PyTorch](https://pytorch.org/get-started/locally/). torchtune is tested with the latest stable PyTorch release as well as the preview nightly version. For multimodality
**Step 1:** [Install PyTorch](https://pytorch.org/get-started/locally/). torchtune is tested with the latest stable PyTorch release as well as the preview nightly version. For fine-tuning the multimodal LLMs available in the repo, you'll need to install torchvision as well

be sure to also install torchvision.

```
# Install stable version of PyTorch using pip
pip3 install torch torchvision
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to explicitly say pip3 anymore


# Nightly install for latest features
pip3 install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu121
```

**Step 2:** The latest stable version of torchtune is hosted on PyPI and can be downloaded with the following command:

Expand Down
2 changes: 0 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,6 @@ authors = [
]
keywords = ["pytorch", "finetuning", "llm"]
dependencies = [
# multimodality
"torchvision",
Copy link
Contributor

@felipemello1 felipemello1 Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is a way to keep torchvision without causing issues with torch nightlies. Do you know if its worth researching, or there is no way to make it work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just chatted with @NicolasHug on this and he confirmed it's not possible since there's no way to point pyproject.toml to a specific conda channel or PyPI repo

Copy link
Contributor

@felipemello1 felipemello1 Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it! i am just afraid that that in the long run it may cause issues, since we may want to pin torchvision version. For example, in ClipTransforms, older versions will break.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be installed in a single command, this is why I updated the readme to clarify this. So if the user runs pip install torch torchvision they will get stable versions of both torch and torchvision; if they run pip3 install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu121 they will get torch nightly and torchvision 0.20.

I can't find the link to the original comment but I believe that was demonstrating what happens when we install using the existing pyproject.toml. For reference, here is my pip list after running the first command; here is my pip list after running the second command. You can see that the versions are as expected.

Btw regarding pinning versions -- we do not test on anything older than the latest stable version of PyTorch and I don't think we want to worry about breaking folks on older versions than that. By the same logic, I don't think we should be pinning to older versions of torchvision. Then the best way to keep things in sync is just install the two together using these commands.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the links and explanation!


# Hugging Face integrations
"datasets",
Expand Down
4 changes: 4 additions & 0 deletions tests/torchtune/utils/test_distributed.py
Original file line number Diff line number Diff line change
Expand Up @@ -262,6 +262,10 @@ def world_size(self) -> int:
return 2

@gpu_test(gpu_count=2)
@pytest.mark.skipif(
version.parse(torch.__version__).base_version < "2.4.0",
reason="torch >= 2.4 required",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason we don't want to test this with torch 2.3.x?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the DTensor APIs we use in load_from_full_model_state_dict were not stable prior to 2.4. We already addressed this for the QLoRA state dict test in #1087. In this case it's OK because we are testing FSDP2 functionality which is not available until 2.4 anyways. cc @weifengpy in case I'm missing any important points here.

)
def test_lora_state_dict(self):
rank = self.rank
is_rank_zero = rank == 0
Expand Down
24 changes: 7 additions & 17 deletions torchtune/modules/low_precision/_register_nf4_dispatch_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,9 @@
# This source code is licensed under the BSD-style license found in the
# LICENSE file in the root directory of this source tree.

from importlib.metadata import PackageNotFoundError, version

import torch
from torchao.dtypes.nf4tensor import implements as nf4_tensor_impl, to_nf4


def is_fbcode():
return not hasattr(torch.version, "git_version")
from torchtune.modules.low_precision._utils import _get_torchao_version


@nf4_tensor_impl([torch.ops.aten.clone.default])
Expand All @@ -26,17 +21,12 @@ def clone(func, *args, **kwargs):


should_define_inplace_copy = True
if not is_fbcode():
try:
ao_version = version("torchao")
should_define_inplace_copy = ao_version < "0.2.0"
# For importlib metadata, need to check nightly separately
except PackageNotFoundError:
ao_version = version("torchao-nightly")
should_define_inplace_copy = ao_version < "2024.5.20"
except Exception as e:
raise PackageNotFoundError("Could not find torchao version") from e

ao_version, is_nightly = _get_torchao_version()
if ao_version:
if (is_nightly and ao_version >= "2024.5.20") or (
not is_nightly and ao_version >= "0.2.0"
):
should_define_inplace_copy = False

if should_define_inplace_copy:
# TorchAO have `NF4.copy_` starting from `0.2.0`
Expand Down
53 changes: 53 additions & 0 deletions torchtune/modules/low_precision/_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the BSD-style license found in the
# LICENSE file in the root directory of this source tree.

from importlib.metadata import PackageNotFoundError, version
from typing import Optional, Tuple

import torch

import torchao


def _is_fbcode():
return not hasattr(torch.version, "git_version")


def _get_torchao_version() -> Tuple[Optional[str], Optional[bool]]:
"""
Get torchao version. Returns a tuple of two elements, the first element
is the version string, the second element is whether it's a nightly version.
For fbcode usage, return None, None.

Checks:
1) is_fbcode, then
2) importlib's version(torchao-nightly) for nightlies, then
3) torchao.__version__ (only defined for torchao >= 0.3.0), then
4) importlib's version(torchao) for non-nightly


If none of these work, raise an error.

"""
if _is_fbcode():
return None, None
# Check for nightly install first
try:
ao_version = version("torchao-nightly")
is_nightly = True
except PackageNotFoundError:
try:
ao_version = torchao.__version__
is_nightly = False
except AttributeError:
ao_version = "unknown"
if ao_version == "unknown":
try:
ao_version = version("torchao")
is_nightly = False
except Exception as e:
raise PackageNotFoundError("Could not find torchao version") from e
return ao_version, is_nightly
9 changes: 8 additions & 1 deletion torchtune/utils/quantization.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,17 @@
from torchao.quantization.quant_api import (
Int4WeightOnlyGPTQQuantizer,
Int4WeightOnlyQuantizer,
quantize,
Quantizer,
)

from torchtune.modules.low_precision._utils import _get_torchao_version

ao_version, is_nightly = _get_torchao_version()
if is_nightly and (ao_version >= "2024.7.3"):
from torchao.quantization.quant_api import quantize_ as quantize
else:
from torchao.quantization.quant_api import quantize

# importing TORCH_VERSION_AFTER_2_3 because `Int8DynActInt4WeightQuantizer`
# is only available after 2.3 so we have to guard the pytorch versions to decide
# the list of supported quantizers
Expand Down
Loading