Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix compile with nvjpeg on Windows CUDA 12 #8641

Merged
merged 3 commits into from
Sep 11, 2024

Conversation

atalman
Copy link
Contributor

@atalman atalman commented Sep 11, 2024

Root cause of the issue

C:\actions-runner\_work\_temp\conda_environment_10772459803\lib\site-packages\torch\cuda\__init__.py:129: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org/ to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\c10\cuda\CUDAFunctions.cpp:108.)

Hence we are seeing for cuda 12+ jobs:

torch.cuda.is_available: False

As a result its failing builder checks here:
https://github.com/pytorch/builder/actions/runs/10776424717/job/29883192429

torchvision: 0.20.0.dev20240908+cu121
torch.cuda.is_available: True
torch.ops.image._jpeg_version() = 80
Is torchvision usable? True
German shepherd (cpu): 37.6%
Traceback (most recent call last):
  File "C:\actions-runner\_work\builder\builder\pytorch\builder\vision\test\smoke_test.py", line 113, in <module>
    main()
  File "C:\actions-runner\_work\builder\builder\pytorch\builder\vision\test\smoke_test.py", line 101, in main
    smoke_test_torchvision_decode_jpeg("cuda")
  File "C:\actions-runner\_work\builder\builder\pytorch\builder\vision\test\smoke_test.py", line 37, in smoke_test_torchvision_decode_jpeg
    img_jpg = decode_jpeg(img_jpg_data, device=device)
  File "C:\Jenkins\Miniconda3\envs\conda-env-10776424717\lib\site-packages\torchvision\io\image.py", line 223, in decode_jpeg
    return torch.ops.image.decode_jpegs_cuda([input], mode.value, device)[0]
  File "C:\Jenkins\Miniconda3\envs\conda-env-10776424717\lib\site-packages\torch\_ops.py", line 1116, in __call__
    return self._op(*args, **(kwargs or {}))
RuntimeError: decode_jpegs_cuda: torchvision not compiled with nvJPEG support

Driver Update issue should not prevent us to compile torchvision with full CUDA support. We can do it even with CPU instance. Hence when FORCE_CUDA flag is set, we should try to include nvjpeg module.

As a followup we should address Driver issue

Copy link

pytorch-bot bot commented Sep 11, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/8641

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 1249d0b with merge base 00e7fa1 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@atalman atalman merged commit db5f8a0 into pytorch:main Sep 11, 2024
81 of 82 checks passed
@atalman atalman deleted the fix_nvjpeg_include_windows branch September 11, 2024 16:27
Copy link

Hey @atalman!

You merged this PR, but no labels were added.
The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

facebook-github-bot pushed a commit that referenced this pull request Sep 13, 2024
Reviewed By: vmoens

Differential Revision: D62581682

fbshipit-source-id: 40ee1636bb1608da92b1fc258634d26c88a430fd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants