[Bug]: OutOfMemoryError: CUDA out of memory. #3624

soniya-dhoke · 2024-09-17T17:09:44Z

Checklist

The issue has not been resolved by following the troubleshooting guide
The issue exists on a clean installation of Fooocus
The issue exists in the current version of Fooocus
The issue has not been reported before recently
The issue has been reported before but has not been fixed yet

What happened?

When i use foocus in colab and use jaggernaut as the base model and realistcStockPhoto as the refiner i am getting CUDA out of memory error

Steps to reproduce the problem

use foocus in colab
select jaggernaut as the base model
realistcStockPhoto as the refiner

What should have happened?

no idea

What browsers do you use to access Fooocus?

Google Chrome

Where are you running Fooocus?

Cloud (Google Colab)

What operating system are you using?

No response

Console logs

[System ARGV] ['entry_with_update.py', '--share', '--always-high-vram', '--preset', 'realistic']
Python 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0]
Fooocus version: 2.5.5
Loaded preset: /content/Fooocus/presets/realistic.json
[Cleanup] Attempting to delete content of temp dir /tmp/fooocus
[Cleanup] Cleanup successful
Downloading: "https://huggingface.co/lllyasviel/fav_models/resolve/main/fav/realisticStockPhoto_v20.safetensors" to /content/Fooocus/models/checkpoints/realisticStockPhoto_v20.safetensors

100% 6.46G/6.46G [00:31<00:00, 220MB/s]
Downloading: "https://huggingface.co/mashb1t/fav_models/resolve/main/fav/SDXL_FILM_PHOTOGRAPHY_STYLE_V1.safetensors" to /content/Fooocus/models/loras/SDXL_FILM_PHOTOGRAPHY_STYLE_V1.safetensors

100% 870M/870M [00:06<00:00, 142MB/s]
Total VRAM 15102 MB, total RAM 12979 MB
Set vram state to: HIGH_VRAM
Always offload VRAM
Device: cuda:0 Tesla T4 : native
VAE dtype: torch.float32
Using pytorch cross attention
Refiner unloaded.
IMPORTANT: You are using gradio version 3.41.2, however version 4.29.0 is available, please upgrade.
--------
Running on local URL:  http://127.0.0.1:7865
Running on public URL: https://f388f72c3afb356bea.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)
model_type EPS
UNet ADM Dimension 2816
Using pytorch attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using pytorch attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale'}
left over keys: dict_keys(['cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids'])
loaded straight to GPU
Requested to load SDXL
Loading 1 new model
Base model loaded: /content/Fooocus/models/checkpoints/realisticStockPhoto_v20.safetensors
VAE loaded: None
Request to load LoRAs [('SDXL_FILM_PHOTOGRAPHY_STYLE_V1.safetensors', 0.25)] for model [/content/Fooocus/models/checkpoints/realisticStockPhoto_v20.safetensors].
Loaded LoRA [/content/Fooocus/models/loras/SDXL_FILM_PHOTOGRAPHY_STYLE_V1.safetensors] for UNet [/content/Fooocus/models/checkpoints/realisticStockPhoto_v20.safetensors] with 722 keys at weight 0.25.
Loaded LoRA [/content/Fooocus/models/loras/SDXL_FILM_PHOTOGRAPHY_STYLE_V1.safetensors] for CLIP [/content/Fooocus/models/checkpoints/realisticStockPhoto_v20.safetensors] with 264 keys at weight 0.25.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cuda:0, use_fp16 = True.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
[Fooocus Model Management] Moving model(s) has taken 1.49 seconds
2024-09-17 16:40:47.944739: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-09-17 16:40:48.210000: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-09-17 16:40:48.281258: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-09-17 16:40:48.704490: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-09-17 16:40:51.178811: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Started worker with PID 1604
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865 or https://f388f72c3afb356bea.gradio.live
[Parameters] Adaptive CFG = 7
[Parameters] CLIP Skip = 2
[Parameters] Sharpness = 2
[Parameters] ControlNet Softness = 0.25
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] Seed = 31060369711237223
[Parameters] CFG = 3
[Fooocus] Loading control models ...
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
model_type EPS
UNet ADM Dimension 2816
Using pytorch attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using pytorch attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale'}
left over keys: dict_keys(['cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids'])
loaded straight to GPU
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 0.71 seconds
Refiner model loaded: /content/Fooocus/models/checkpoints/realisticStockPhoto_v20.safetensors
Traceback (most recent call last):
  File "/content/Fooocus/modules/async_worker.py", line 1471, in worker
    handler(task)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/content/Fooocus/modules/async_worker.py", line 1160, in handler
    tasks, use_expansion, loras, current_progress = process_prompt(async_task, async_task.prompt, async_task.negative_prompt,
  File "/content/Fooocus/modules/async_worker.py", line 661, in process_prompt
    pipeline.refresh_everything(refiner_model_name=async_task.refiner_model_name,
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/content/Fooocus/modules/default_pipeline.py", line 250, in refresh_everything
    refresh_base_model(base_model_name, vae_name)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/content/Fooocus/modules/default_pipeline.py", line 74, in refresh_base_model
    model_base = core.load_model(filename, vae_filename)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/content/Fooocus/modules/core.py", line 147, in load_model
    unet, clip, vae, vae_filename, clip_vision = load_checkpoint_guess_config(ckpt_filename, embedding_directory=path_embeddings,
  File "/content/Fooocus/ldm_patched/modules/sd.py", line 462, in load_checkpoint_guess_config
    model = model_config.get_model(sd, "model.diffusion_model.", device=inital_load_device)
  File "/content/Fooocus/ldm_patched/modules/supported_models.py", line 173, in get_model
    out = model_base.SDXL(self, model_type=self.model_type(state_dict, prefix), device=device)
  File "/content/Fooocus/ldm_patched/modules/model_base.py", line 293, in __init__
    super().__init__(model_config, model_type, device=device)
  File "/content/Fooocus/ldm_patched/modules/model_base.py", line 51, in __init__
    self.diffusion_model = UNetModel(**unet_config, device=device, operations=operations)
  File "/content/Fooocus/ldm_patched/ldm/modules/diffusionmodules/openaimodel.py", line 773, in __init__
    get_attention_layer(
  File "/content/Fooocus/ldm_patched/ldm/modules/diffusionmodules/openaimodel.py", line 556, in get_attention_layer
    return SpatialTransformer(
  File "/content/Fooocus/ldm_patched/ldm/modules/attention.py", line 586, in __init__
    [BasicTransformerBlock(inner_dim, n_heads, d_head, dropout=dropout, context_dim=context_dim[d],
  File "/content/Fooocus/ldm_patched/ldm/modules/attention.py", line 586, in <listcomp>
    [BasicTransformerBlock(inner_dim, n_heads, d_head, dropout=dropout, context_dim=context_dim[d],
  File "/content/Fooocus/ldm_patched/ldm/modules/attention.py", line 414, in __init__
    self.attn1 = CrossAttention(query_dim=inner_dim, heads=n_heads, dim_head=d_head, dropout=dropout,
  File "/content/Fooocus/ldm_patched/ldm/modules/attention.py", line 379, in __init__
    self.to_out = nn.Sequential(operations.Linear(inner_dim, query_dim, dtype=dtype, device=device), nn.Dropout(dropout))
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py", line 99, in __init__
    self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 14.75 GiB of which 11.06 MiB is free. Process 17512 has 14.73 GiB memory in use. Of the allocated memory 14.12 GiB is allocated by PyTorch, and 501.47 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Total time: 39.47 seconds

Additional information

No response

mashb1t · 2024-09-24T09:57:36Z

Do not use --always-high-vram if you run into issues like these

soniya-dhoke added bug Something isn't working triage This needs an (initial) review labels Sep 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: OutOfMemoryError: CUDA out of memory. #3624

[Bug]: OutOfMemoryError: CUDA out of memory. #3624

soniya-dhoke commented Sep 17, 2024

mashb1t commented Sep 24, 2024

[Bug]: OutOfMemoryError: CUDA out of memory. #3624

[Bug]: OutOfMemoryError: CUDA out of memory. #3624

Comments

soniya-dhoke commented Sep 17, 2024

Checklist

What happened?

Steps to reproduce the problem

What should have happened?

What browsers do you use to access Fooocus?

Where are you running Fooocus?

What operating system are you using?

Console logs

Additional information

mashb1t commented Sep 24, 2024