-
Notifications
You must be signed in to change notification settings - Fork 411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] PTQ for generate_v2
#1866
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1866
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 2 Cancelled JobsAs of commit eafd3b2 with merge base f8073ed (): NEW FAILURE - The following job has failed:
CANCELLED JOBS - The following jobs were cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
self._device = utils.get_device(device=cfg.device) | ||
self._dtype = training.get_dtype(dtype=cfg.dtype, device=self._device) | ||
self._logger = utils.get_logger(cfg.log_level) | ||
self.device = utils.get_device(device=cfg.device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a public recipe, no need to be a "private" variable.
cc @pbontrager
|
||
# Quantize the model if specified | ||
if cfg.get("quantization_method") is not None: | ||
from torchao.quantization.quant_api import quantize_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lazily import torchao API
from torchao.quantization.quant_api import quantize_ | ||
|
||
quantization_method = config.instantiate(cfg.quantization_method) | ||
compile_model(model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compiling the model is necessary for quantization to be really worth it
|
||
# 6. Prefill step | ||
generated_tokens = [] | ||
t0 = time.perf_counter() | ||
logits = self.model(prompt, **batch)[:, -1] | ||
token = sample(logits, temperature=cfg.temperature, top_k=cfg.top_k) | ||
t1 = time.perf_counter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we might have a warmup run, we log this differently so the user can see how much help quantization / compilation helps.
@@ -9,6 +9,10 @@ | |||
# Model arguments | |||
model: | |||
_component_: torchtune.models.llama2.llama2_7b | |||
# You can turn uncomment the following lines to enable quantization for faster inference and potentially lower VRAM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave this commented out until the user wants to do something with it.
generate_v2
prompt = torch.tensor( | ||
model_inputs["tokens"], device=self._device | ||
).unsqueeze(0) | ||
prompt = torch.tensor(model_inputs["tokens"], device=self.device)[None, :] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted this to fix on one line lol
@@ -18,6 +19,13 @@ | |||
CACHE_ARTIFACTS_SCRIPT_PATH = root + "/tests/cache_artifacts.sh" | |||
|
|||
|
|||
def pytest_sessionfinish(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compile tries to log a bunch of stuff using the atexit
decorator. However, pytest closes these logs before they finish so it throws an I/O error.
This disables logging exceptions. Not sure if the right way to do it.
Context
What is the purpose of this PR? Is it to
This does NOT enable PTQ for QAT finetuned models
Changelog
Test plan
Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.
pre-commit install
)pytest tests
pytest tests -m integration_test
UX
If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example