accelerator.prepare() get OOM,but available in single GPU #3182

lqf0624 · 2024-10-21T07:37:35Z

System Info

- `Accelerate` version: 1.0.1
- Platform: Linux-5.4.0-169-generic-x86_64-with-glibc2.35
- `accelerate` bash location: /opt/conda/bin/accelerate
- Python version: 3.10.14
- Numpy version: 1.26.4
- PyTorch version (GPU?): 2.3.0+cu118 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- PyTorch MLU available: False
- PyTorch MUSA available: False
- System RAM: 2015.00 GB
- GPU type: NVIDIA A800-SXM4-40GB
- `Accelerate` default config:
        Not found

Information

The official example scripts
My own modified scripts

Tasks

One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)

Reproduction

import accelerate
from accelerate import DistributedDataParallelKwargs
from transformers import GPT2Model
accelerator = accelerate.Accelerator(kwargs_handlers=[ddp_kwargs])
model = GPT2Model.from_pretrained(args.model_dir,output_hidden_states = True)
        
if args.pretrain == 1 and args.freeze == 1:
    peft_config = LoraConfig(
    r=128,
    lora_alpha=256,
    lora_dropout=0.1,
    )   
model = get_peft_model(model, peft_config)
model = accelerator.prepare(model)

Expected behavior

Here is the information：

Traceback (most recent call last):
  File "/workspace/Graph-Network/main.py", line 174, in <module>
    model = accelerator.prepare(model)
  File "/opt/conda/lib/python3.10/site-packages/accelerate/accelerator.py", line 1350, in prepare
    result = tuple(
  File "/opt/conda/lib/python3.10/site-packages/accelerate/accelerator.py", line 1351, in <genexpr>
    self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
  File "/opt/conda/lib/python3.10/site-packages/accelerate/accelerator.py", line 1226, in _prepare_one
    return self.prepare_model(obj, device_placement=device_placement)
  File "/opt/conda/lib/python3.10/site-packages/accelerate/accelerator.py", line 1460, in prepare_model
    model = model.to(self.device)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1173, in to
    return self._apply(convert)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply
    module._apply(fn)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply
    module._apply(fn)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply
    module._apply(fn)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 804, in _apply
    param_applied = fn(param)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1159, in convert
    return t.to(
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

It's confusing that CUDA raise OOM but unlike others, it did not even try to allocate any GPU memory. In fact, my GPUs are empty according to nvidia-smi

The text was updated successfully, but these errors were encountered:

BenjaminBossan · 2024-10-22T10:13:06Z

Thanks for reporting. Could you please:

Share the output of accelerate env
Tell us how you run the script
Tell us what PEFT version you're using
What is the model in args.model_dir?
If you comment out model = get_peft_model(model, peft_config), do you get the same error?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

accelerator.prepare() get OOM,but available in single GPU #3182

accelerator.prepare() get OOM,but available in single GPU #3182

lqf0624 commented Oct 21, 2024 •

edited by BenjaminBossan

Loading

BenjaminBossan commented Oct 22, 2024

accelerator.prepare() get OOM,but available in single GPU #3182

accelerator.prepare() get OOM,but available in single GPU #3182

Comments

lqf0624 commented Oct 21, 2024 • edited by BenjaminBossan Loading

System Info

Information

Tasks

Reproduction

Expected behavior

BenjaminBossan commented Oct 22, 2024

lqf0624 commented Oct 21, 2024 •

edited by BenjaminBossan

Loading