You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
Traceback (most recent call last):
File "/workspace/Graph-Network/main.py", line 174, in <module>
model = accelerator.prepare(model)
File "/opt/conda/lib/python3.10/site-packages/accelerate/accelerator.py", line 1350, in prepare
result = tuple(
File "/opt/conda/lib/python3.10/site-packages/accelerate/accelerator.py", line 1351, in <genexpr>
self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
File "/opt/conda/lib/python3.10/site-packages/accelerate/accelerator.py", line 1226, in _prepare_one
return self.prepare_model(obj, device_placement=device_placement)
File "/opt/conda/lib/python3.10/site-packages/accelerate/accelerator.py", line 1460, in prepare_model
model = model.to(self.device)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1173, in to
return self._apply(convert)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply
module._apply(fn)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply
module._apply(fn)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply
module._apply(fn)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 804, in _apply
param_applied = fn(param)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1159, in convert
return t.to(
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
It's confusing that CUDA raise OOM but unlike others, it did not even try to allocate any GPU memory. In fact, my GPUs are empty according to nvidia-smi
The text was updated successfully, but these errors were encountered:
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
Expected behavior
Here is the information:
It's confusing that CUDA raise OOM but unlike others, it did not even try to allocate any GPU memory. In fact, my GPUs are empty according to nvidia-smi
The text was updated successfully, but these errors were encountered: