Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Improve bf16 test capabilities to better detect invalid gpu implementations #3442

Open
2 tasks done
LeptonWu opened this issue Sep 19, 2024 · 9 comments
Open
2 tasks done
Labels
enhancement New feature or request

Comments

@LeptonWu
Copy link

LeptonWu commented Sep 19, 2024

Issue Description

On debian 12, I installed intel oneapi toolkit and then launch webui.sh with --use-ipex to hope I can use the iGPU of it. It looks it's working and when it generated images, intel_gpu_top showed there is load on GPU. But the generated image is a mess, I attached 2 images to show what I mean (bad one is generated by default setup, good one is generated after I change the device type to FP16 and restart the server). I noticed that in the log, it's telling pytorch to use bf16 but as far as I know bf16 is something new and my old iGPU maybe doesn't work with it. So changed the system settings and set device type to FP16, then it works.

I did see there is one test_bf16 function in modules/devices.py before we enabling it by default. But it seems it's just calling some api and didn't check any output. This could actually be a bug of some version of intel oneapi, but I am wondering, should we do some basic correctness check in test_bf16?

It's kind of quite confusing for me when there isn't any warning while I got assorted image.
00044-realisticVisionV60B1_v51HyperVAE-A black cat-failed
00041-realisticVisionV60B1_v51HyperVAE-A black cat

BTW, it takes around 2.5 mins to generate a 512x512 image with the iGPU of this G4620

Version Platform Description

No response

Relevant log output

No response

Backend

Diffusers

UI

Standard

Branch

Master

Model

StableDiffusion 1.5

Acknowledgements

  • I have read the above and searched for existing issues
  • I confirm that this is classified correctly and its not an extension issue
@vladmandic
Copy link
Owner

This could actually be a bug of some version of intel oneapi, but I am wondering, should we do some basic correctness check in test_bf16?

function test_bf16 as-is serves to perform basic correctness as it fails if gpu is not bf16 compatible.
anything more - well, contributions are welcome.
i'm changing this to a feature request.

@vladmandic vladmandic added the enhancement New feature or request label Sep 19, 2024
@vladmandic vladmandic changed the title [Issue]: Bad image generared on intel G4620 with integrated gpu (Intel(R) HD Graphics 630) because of usage of bf16 [Feature]: Improve bf16 test capabilities to better detect invalid gpu implementations Sep 19, 2024
@Disty0
Copy link
Collaborator

Disty0 commented Sep 20, 2024

iGPUs are not officially supported. Every device officially supported by IPEX also supports BF16.
There is test done by SDNext but IPEX seems to be computing nonsense instead of throwing BF16 not supported like CUDA and ROCm does.

Fyi, IPEX doesn't have full support for FP16 and a lot of stuff will upcast to FP32 instead.
BF16 is the default dtype set by IPEX itself and the preferred dtype for IPEX.
OpenVINO is the better option for unsupported devices like old iGPUs.

@LeptonWu
Copy link
Author

LeptonWu commented Sep 20, 2024

Thanks for the explanation, I filed a bug at intel side to ask them at least throw some exception...

intel/intel-extension-for-pytorch#711

Regarding with openvino, it came with its own bug on my platform. It will crash by default becaused it used avx instruction in libopenvino_intel_npu_plugin.so. I have to delete it to get things work.

BTW, I did some simple test and it seems it at least give reasonable result for my tested simple function. Not really
sure what intel was doing there.

import torch
import intel_extension_for_pytorch
import torch.nn as nn

dev="xpu"
dt=torch.bfloat16
t=torch.full((1,3), 5000).to(dev, dtype=dt)
f=nn.Linear(3,1).to(dev, dtype=dt)
nn.init.constant_(f.weight, 2)
nn.init.constant_(f.bias, 10000)
o=f(t)
print(o[0][0])

@Disty0
Copy link
Collaborator

Disty0 commented Sep 20, 2024

Also, it seems the SdNext default setting doesn't work, I have to switch to FP32 to make it use my iGPU.

Default dtype for PyTorch (the setting in SDNext) when using OpenVINO is already FP32. OpenVINO can convert the new OpenVINO model to FP16 on its own based on your device.

@LeptonWu
Copy link
Author

LeptonWu commented Sep 20, 2024

Also, it seems the SdNext default setting doesn't work, I have to switch to FP32 to make it use my iGPU.

Default dtype for PyTorch (the setting in SDNext) when using OpenVINO is already FP32. OpenVINO can convert the new OpenVINO model to FP16 on its own based on your device.

Sorry, I changed my comments. It turned out that when switching to openvino, I just clean the venv but I didn't deleted the config.json, so SdNext still read the configuration from the old config.json which is for ipex. I confirmed if I delete the config.json, SdNext will set dtype to FP32. Also it seems FP16 also work. Not sure why it doesn't use GPU with the configuration left by ipex run.

@Disty0
Copy link
Collaborator

Disty0 commented Sep 20, 2024

Also noticed that your iGPU lists itself as fp64 capable according to the env info.
Can you try exporting IPEX_FORCE_ATTENTION_SLICE=1

This might be a 4GB allocation issue if the iGPU is a 32 bit iGPU
Intel ARC and other 32 bit GPUs computes nonsense when you allocate more than 4GB on a single block.

@LeptonWu
Copy link
Author

Also noticed that your iGPU lists itself as fp64 capable according to the env info. Can you try exporting IPEX_FORCE_ATTENTION_SLICE=1

This might be a 4GB allocation issue if the iGPU is a 32 bit iGPU Intel ARC and other 32 bit GPUs computes nonsense when you allocate more than 4GB on a single block.

Thanks, tried this and didn't find difference, if this was a 4G issue, I guess both fp16 and bf16 will get impacted?

@Disty0
Copy link
Collaborator

Disty0 commented Sep 20, 2024

if this was a 4G issue, I guess both fp16 and bf16 will get impacted?

Yes.

@LeptonWu
Copy link
Author

LeptonWu commented Sep 20, 2024

BTW, one thing I noticed is: after I switched datatype from bf16 to fp16 from the UI, even after I clicked "restart server", the image generated is still buggy. I have to shutdown server and start it again to get things working.
Could this give some hints to where could be wrong?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants