Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension for v2 transforms #8622

Open
lxr2 opened this issue Sep 2, 2024 · 5 comments

Comments

@lxr2
Copy link

lxr2 commented Sep 2, 2024

🐛 Describe the bug

It seems that v2.Pad does not support cases where the padding size is greater than the image size, but v1.Pad does support this. I hope that v2.Pad will allow this in the future as well.

from torchvision.transforms import v2
import torchvision.transforms as T
from torchvision.transforms import functional as F

orig_img = torch.rand([3,32,32])
orig_img = F.to_pil_image(orig_img)

# Not supported
trans_img = v2.Compose([v2.ToImage(), T.Pad(padding=36, padding_mode='reflect')])(orig_img)

# Supported
trans_img = T.Compose([T.Pad(padding=36, padding_mode='reflect')])(orig_img)

Versions

Collecting environment information...
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.35

Python version: 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0] (64-bit runtime)
Python platform: Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3060 Laptop GPU
Nvidia driver version: 546.80
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
...
[conda] torch                     2.4.0                    pypi_0    pypi
[conda] torchmetrics              1.4.0.post0              pypi_0    pypi
[conda] torchvision               0.19.0                   pypi_0    pypi
[conda] triton                    3.0.0                    pypi_0    pypi
@venkatram-dev
Copy link
Contributor

venkatram-dev commented Sep 2, 2024

Not sure of the reason to combine v1 and v2 together in v2.Compose([v2.ToImage(), T.Pad(padding=36,

T.Pad(

Below code works (tested in google colab) . Please try this.


from torchvision.transforms import v2 as T2
import torchvision.transforms.functional as F
import torch

orig_img = torch.rand([3,32,32])
orig_img = F.to_pil_image(orig_img)


# Using v2 API for padding
transform = T2.Compose([
    T2.Pad(padding=36, padding_mode='reflect'),  # Use v2.Pad directly
    #T2.ToTensor()
])

transform
# Apply transformation
trans_img = transform(orig_img)
trans_img

@lxr2
Copy link
Author

lxr2 commented Sep 3, 2024

It works, but following the docs, it seems that the standard steps should include v2.ToImage() if the img is PIL format. I am confused about it.


This is what a typical transform pipeline could look like:

from torchvision.transforms import v2
transforms = v2.Compose([
    v2.ToImage(),  # Convert to tensor, only needed if you had a PIL image
    v2.ToDtype(torch.uint8, scale=True),  # optional, most input are already uint8 at this point
    # ...
    v2.RandomResizedCrop(size=(224, 224), antialias=True),  # Or Resize(antialias=True)
    # ...
    v2.ToDtype(torch.float32, scale=True),  # Normalize expects float input
    v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

@venkatram-dev
Copy link
Contributor

venkatram-dev commented Sep 3, 2024

Below is my understanding, others can chime in as needed :)

Yeah, that is a good point. In My opinion, May be that doc needs to be clear to specify the difference in padding operation done on pillow image and on a tensor.

May be that doc needs to be clear to specify the difference in padding operation done on pillow image and on a tensor.

If we look at other docs for padding, they have used pillow images. https://pytorch.org/vision/main/auto_examples/transforms/plot_transforms_illustrations.html#sphx-glr-auto-examples-transforms-plot-transforms-illustrations-py

Anyways, this is my understanding.

Extra padding (padding size greater than image size) works on a pillow image.

But extra padding does not work on a tensor.

So, if we need extra padding ,it has to be on pillow image.

We can then do the other tensor operations after it.

Root Cause Analysis :

Padding on pillow images uses pillow functions and numpy functions and do not do any checking on dimensions.

https://github.com/pytorch/vision/blob/main/torchvision/transforms/_functional_pil.py#L144-L220

padding on tensor uses pytorch code and does strict type checking for dimensions.

https://github.com/pytorch/pytorch/blob/d14fe3ffeddff743af09ce7c8d91127940ddf7ed/aten/src/ATen/native/ReflectionPad.cpp#L241-L249

My understanding is that PyTorch does these internal checks to prevent padding operations from exceeding the dimensions of a tensor, ensuring that all computations stay within the allocated memory bounds to avoid errors like crashes or data corruption.

Scenario 1 . Extra padding (padding size greater than image size) works on a pillow image.

from torchvision.transforms import v2 as T2
import torchvision.transforms.functional as F
import torch

orig_img = torch.rand([3,32,32])
orig_img = F.to_pil_image(orig_img)
print ('orig type',type(orig_img))
print ('orig shape',orig_img.size)


# Using v2 API for padding
transform = T2.Compose([
        T2.Pad(padding=36, padding_mode='reflect'),  # Use v2.Pad directly
    T2.ToImage(), 
    #T2.ToTensor()
])

#transform
# Apply transformation
trans_img = transform(orig_img)
print('trans_img type',type(trans_img))

print('trans_img shape',trans_img.shape)
trans_img

Above code works

Scenario 2 :
But extra padding does not work on a tensor.

from torchvision.transforms import v2 as T2
import torchvision.transforms.functional as F
import torch

orig_img = torch.rand([3,32,32])
orig_img = F.to_pil_image(orig_img)
print ('orig type',type(orig_img))
print ('orig shape',orig_img.size)

# Using v2 API for padding
transform = T2.Compose([
    T2.ToImage(), 
    T2.Pad(padding=36, padding_mode='reflect'),  # Use v2.Pad directly
])

#transform
# Apply transformation
trans_img = transform(orig_img)
trans_img.shape
print('trans_img type',type(trans_img))

print('trans_img shape',trans_img.shape)
trans_img

RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (36, 36) at dimension 3 of input [1, 3, 32, 32]

from torchvision.transforms import v2 as T2
import torch

# Create a random image tensor
orig_img = torch.rand([3, 32, 32])  # This is a tensor
print ('orig type',type(orig_img))
print ('orig shape',orig_img.shape)


# Define a transformation pipeline with v2 API
transform = T2.Compose([
    T2.Pad(padding=36, padding_mode='reflect'),  # Check if T2.Pad accepts tv_tensors.Image
])

# Apply the transformation
trans_img = transform(orig_img)
print('trans_img type',type(trans_img))

print('trans_img shape',trans_img.shape)
trans_img

RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (36, 36) at dimension 3 of input [1, 3, 32, 32]


from torchvision.transforms import v2 as T2
import torch

# Create a random image tensor
orig_img = torch.rand([3, 32, 32])  # This is a tensor
print ('orig type',type(orig_img))
print ('orig shape',orig_img.shape)


# Define a transformation pipeline with v2 API
transform = T2.Compose([
    T2.ToImage(),  # Convert tensor to tv_tensors.Image
    T2.Pad(padding=36, padding_mode='reflect'),  # Check if T2.Pad accepts tv_tensors.Image
])

# Apply the transformation
trans_img = transform(orig_img)
print('trans_img type',type(trans_img))

print('trans_img shape',trans_img.shape)
trans_img

RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (36, 36) at dimension 3 of input [1, 3, 32, 32]

Scenario 3:
Padding with size less than input dimension works on tensor

from torchvision.transforms import v2 as T2
import torchvision.transforms.functional as F
import torch

orig_img = torch.rand([3,32,32])
orig_img = F.to_pil_image(orig_img)
print ('orig type',type(orig_img))
print ('orig shape',orig_img.size)

# Using v2 API for padding
transform = T2.Compose([
    T2.ToImage(), 
    T2.Pad(padding=30, padding_mode='reflect'),  # Use v2.Pad directly
])


#transform
# Apply transformation
trans_img = transform(orig_img)
trans_img.shape
print('trans_img type',type(trans_img))

print('trans_img shape',trans_img.shape)
trans_img

Above code works


from torchvision.transforms import v2 as T2
import torch

# Create a random image tensor
orig_img = torch.rand([3, 32, 32])  # This is a tensor
print ('orig type',type(orig_img))
print ('orig shape',orig_img.shape)


# Define a transformation pipeline with v2 API
transform = T2.Compose([
    T2.Pad(padding=31, padding_mode='reflect'),  # Check if T2.Pad accepts tv_tensors.Image
])

# Apply the transformation
trans_img = transform(orig_img)
print('trans_img type',type(trans_img))

print('trans_img shape',trans_img.shape)
trans_img

Above code works

@lxr2
Copy link
Author

lxr2 commented Sep 3, 2024

Many thanks, very clear explanations and instructions!

@NicolasHug
Copy link
Member

Thanks for the report @lxr2 , and @venkatram-dev for the help.

Just to summarize: this isn't a v1 vs v2 issue. This is a difference in behavior between the PIL backend and the tensor backend (and this difference can be observed on both v1 and v2).

PIL supports the padding size to be larger than the image dimsions, while torchvision / pytorch doesn't.

simple reproducer:

import torch
from torchvision.transforms import functional as F
from torchvision.transforms.v2 import functional as F2

t = torch.rand(3,32,32)
pil_img = F.to_pil_image(t)

padding = 31  # fails for 32+ on tensors

trans_img = F.pad(pil_img, padding=padding, padding_mode='reflect')
print(trans_img.size)
trans_img = F2.pad(pil_img, padding=padding, padding_mode='reflect')
print(trans_img.size)
trans_img = F.pad(t, padding=padding, padding_mode='reflect')
print(trans_img.shape)
trans_img = F2.pad(t, padding=padding, padding_mode='reflect')
print(trans_img.shape)

Unfortunately, this isn't something we can directly address in torchvision, because the behavior is dictated by torch's pad. Note that there are similar discussions in pytorch/pytorch#18413 but at the time, it was suggested that the existing torch behavior is expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants