[Feature]: Improve bf16 test capabilities to better detect invalid gpu implementations #3442

LeptonWu · 2024-09-19T05:09:13Z

Issue Description

On debian 12, I installed intel oneapi toolkit and then launch webui.sh with --use-ipex to hope I can use the iGPU of it. It looks it's working and when it generated images, intel_gpu_top showed there is load on GPU. But the generated image is a mess, I attached 2 images to show what I mean (bad one is generated by default setup, good one is generated after I change the device type to FP16 and restart the server). I noticed that in the log, it's telling pytorch to use bf16 but as far as I know bf16 is something new and my old iGPU maybe doesn't work with it. So changed the system settings and set device type to FP16, then it works.

I did see there is one test_bf16 function in modules/devices.py before we enabling it by default. But it seems it's just calling some api and didn't check any output. This could actually be a bug of some version of intel oneapi, but I am wondering, should we do some basic correctness check in test_bf16?

It's kind of quite confusing for me when there isn't any warning while I got assorted image.

BTW, it takes around 2.5 mins to generate a 512x512 image with the iGPU of this G4620

Version Platform Description

No response

Relevant log output

No response

Backend

Diffusers

UI

Standard

Branch

Master

Model

StableDiffusion 1.5

Acknowledgements

I have read the above and searched for existing issues
I confirm that this is classified correctly and its not an extension issue

vladmandic · 2024-09-19T11:43:16Z

This could actually be a bug of some version of intel oneapi, but I am wondering, should we do some basic correctness check in test_bf16?

function test_bf16 as-is serves to perform basic correctness as it fails if gpu is not bf16 compatible.
anything more - well, contributions are welcome.
i'm changing this to a feature request.

Disty0 · 2024-09-20T08:39:22Z

iGPUs are not officially supported. Every device officially supported by IPEX also supports BF16.
There is test done by SDNext but IPEX seems to be computing nonsense instead of throwing BF16 not supported like CUDA and ROCm does.

Fyi, IPEX doesn't have full support for FP16 and a lot of stuff will upcast to FP32 instead.
BF16 is the default dtype set by IPEX itself and the preferred dtype for IPEX.
OpenVINO is the better option for unsupported devices like old iGPUs.

LeptonWu · 2024-09-20T17:57:24Z

Thanks for the explanation, I filed a bug at intel side to ask them at least throw some exception...

intel/intel-extension-for-pytorch#711

Regarding with openvino, it came with its own bug on my platform. It will crash by default becaused it used avx instruction in libopenvino_intel_npu_plugin.so. I have to delete it to get things work.

BTW, I did some simple test and it seems it at least give reasonable result for my tested simple function. Not really
sure what intel was doing there.

import torch
import intel_extension_for_pytorch
import torch.nn as nn

dev="xpu"
dt=torch.bfloat16
t=torch.full((1,3), 5000).to(dev, dtype=dt)
f=nn.Linear(3,1).to(dev, dtype=dt)
nn.init.constant_(f.weight, 2)
nn.init.constant_(f.bias, 10000)
o=f(t)
print(o[0][0])

Disty0 · 2024-09-20T19:43:00Z

Also, it seems the SdNext default setting doesn't work, I have to switch to FP32 to make it use my iGPU.

Default dtype for PyTorch (the setting in SDNext) when using OpenVINO is already FP32. OpenVINO can convert the new OpenVINO model to FP16 on its own based on your device.

LeptonWu · 2024-09-20T19:51:54Z

Also, it seems the SdNext default setting doesn't work, I have to switch to FP32 to make it use my iGPU.

Default dtype for PyTorch (the setting in SDNext) when using OpenVINO is already FP32. OpenVINO can convert the new OpenVINO model to FP16 on its own based on your device.

Sorry, I changed my comments. It turned out that when switching to openvino, I just clean the venv but I didn't deleted the config.json, so SdNext still read the configuration from the old config.json which is for ipex. I confirmed if I delete the config.json, SdNext will set dtype to FP32. Also it seems FP16 also work. Not sure why it doesn't use GPU with the configuration left by ipex run.

Disty0 · 2024-09-20T19:55:59Z

Also noticed that your iGPU lists itself as fp64 capable according to the env info.
Can you try exporting IPEX_FORCE_ATTENTION_SLICE=1

This might be a 4GB allocation issue if the iGPU is a 32 bit iGPU
Intel ARC and other 32 bit GPUs computes nonsense when you allocate more than 4GB on a single block.

LeptonWu · 2024-09-20T20:46:15Z

Also noticed that your iGPU lists itself as fp64 capable according to the env info. Can you try exporting IPEX_FORCE_ATTENTION_SLICE=1

This might be a 4GB allocation issue if the iGPU is a 32 bit iGPU Intel ARC and other 32 bit GPUs computes nonsense when you allocate more than 4GB on a single block.

Thanks, tried this and didn't find difference, if this was a 4G issue, I guess both fp16 and bf16 will get impacted?

Disty0 · 2024-09-20T20:47:56Z

if this was a 4G issue, I guess both fp16 and bf16 will get impacted?

Yes.

LeptonWu · 2024-09-20T21:37:06Z

BTW, one thing I noticed is: after I switched datatype from bf16 to fp16 from the UI, even after I clicked "restart server", the image generated is still buggy. I have to shutdown server and start it again to get things working.
Could this give some hints to where could be wrong?

vladmandic added the enhancement New feature or request label Sep 19, 2024

vladmandic changed the title ~~[Issue]: Bad image generared on intel G4620 with integrated gpu (Intel(R) HD Graphics 630) because of usage of bf16~~ [Feature]: Improve bf16 test capabilities to better detect invalid gpu implementations Sep 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Improve bf16 test capabilities to better detect invalid gpu implementations #3442

[Feature]: Improve bf16 test capabilities to better detect invalid gpu implementations #3442

LeptonWu commented Sep 19, 2024 •

edited

Loading

vladmandic commented Sep 19, 2024

Disty0 commented Sep 20, 2024 •

edited

Loading

LeptonWu commented Sep 20, 2024 •

edited

Loading

Disty0 commented Sep 20, 2024 •

edited

Loading

LeptonWu commented Sep 20, 2024 •

edited

Loading

Disty0 commented Sep 20, 2024

LeptonWu commented Sep 20, 2024

Disty0 commented Sep 20, 2024 •

edited

Loading

LeptonWu commented Sep 20, 2024 •

edited

Loading

[Feature]: Improve bf16 test capabilities to better detect invalid gpu implementations #3442

[Feature]: Improve bf16 test capabilities to better detect invalid gpu implementations #3442

Comments

LeptonWu commented Sep 19, 2024 • edited Loading

Issue Description

Version Platform Description

Relevant log output

Backend

UI

Branch

Model

Acknowledgements

vladmandic commented Sep 19, 2024

Disty0 commented Sep 20, 2024 • edited Loading

LeptonWu commented Sep 20, 2024 • edited Loading

Disty0 commented Sep 20, 2024 • edited Loading

LeptonWu commented Sep 20, 2024 • edited Loading

Disty0 commented Sep 20, 2024

LeptonWu commented Sep 20, 2024

Disty0 commented Sep 20, 2024 • edited Loading

LeptonWu commented Sep 20, 2024 • edited Loading

LeptonWu commented Sep 19, 2024 •

edited

Loading

Disty0 commented Sep 20, 2024 •

edited

Loading

LeptonWu commented Sep 20, 2024 •

edited

Loading

Disty0 commented Sep 20, 2024 •

edited

Loading

LeptonWu commented Sep 20, 2024 •

edited

Loading

Disty0 commented Sep 20, 2024 •

edited

Loading

LeptonWu commented Sep 20, 2024 •

edited

Loading