[Bug]: Transformations behaviour possibly not expected when predicting with fresh dataloader #2254

blaz-r · 2024-08-18T14:14:42Z

Describe the bug

Inside the engine code the trasnforms from the datamodule and dataloader are taken before the ones from the model:

Lines 382 to 398 in 2bd2842

    
           # get transform 
        
           if datamodule and datamodule.transform: 
        
               # a transform passed explicitly to the datamodule takes precedence 
        
               transform = datamodule.transform 
        
           elif dataloaders and any(getattr(dl.dataset, "transform", None) for dl in dataloaders): 
        
               # if dataloaders are provided, we use the transform from the first dataloader that has a transform 
        
               transform = next(dl.dataset.transform for dl in dataloaders if getattr(dl.dataset, "transform", None)) 
        
           elif ckpt_path is not None: 
        
               # if a checkpoint path is provided, we can load the transform from the checkpoint 
        
               checkpoint = torch.load(ckpt_path, map_location=model.device) 
        
               transform = checkpoint["transform"] 
        
           elif model.transform is None: 
        
               # if no transform is provided, we use the default transform from the model 
        
               image_size = datamodule.image_size if datamodule else None 
        
               transform = model.configure_transforms(image_size) 
        
           else: 
        
               transform = model.transform

This can cause problems if datamodule was never used inside the trainer. In that case looking at the following code, the transforms returned are just resize (since there is no trainer to take the correct model transforms from). This leads to missing the potential normalization from the model:

anomalib/src/anomalib/data/base/datamodule.py

Lines 266 to 277 in 2bd2842

    
               def train_transform(self) -> Transform: 
        
                   """Get the transforms that will be passed to the train dataset. 
        
                   If the train_transform is not set, the engine will request the transform from the model. 
        
                   """ 
        
                   if self._train_transform: 
        
                       return self._train_transform 
        
                   if getattr(self, "trainer", None) and self.trainer.model and self.trainer.model.transform: 
        
                       return self.trainer.model.transform 
        
                   if self.image_size: 
        
                       return Resize(self.image_size, antialias=True) 
        
                   return None

This happens only if you are calling setup and dataloader from the datamodule outside trainer (as that is not set in this case) like this:

datamodule= MVTec(..)
datamodule.setup() # this calls self.train_transforms mentioned above
datamodule.test_dataloader() # only resize transofmr here

There is also code below to reproduce this.

A workaround I have at the moment is doing this:

dataloader.dataset.transform = None

This way the dataloader transforms are ignored and the ones from model are taken. Another possibility that I see is by manually setting the transforms of the datamodule with model.configure_transforms(image_size).

Dataset

N/A

Model

N/A

Steps to reproduce the behavior

    #### CONFIGURE THIS ####
    mvtec_path = "../datasets/MVTec"
    #####

    data = MVTec(root=mvtec_path, image_size=(42, 42), category="bottle", num_workers=0)
    data.setup()

    print(data.test_dataloader().dataset.transform)

OS information

OS information:

OS: Win11
Python version: 3.10.3
Anomalib version: 1.2.0dev
PyTorch version: 2.3.0
CUDA/cuDNN version: x
GPU models and configuration: x
Any other relevant information: x

Expected behavior

I would expect that in this case, the model transforms would take priority over the ones in dataloader, but I see how this would cause trouble in case of custom transforms inside datamodule.

Screenshots

No response

Pip/GitHub

GitHub

What version/branch did you use?

1.2.0dev

Configuration YAML

Logs

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

samet-akcay · 2024-08-20T08:32:59Z

@djdameln any thoughts?

blaz-r · 2024-09-25T18:38:38Z

Is there any updates on this one?

samet-akcay · 2024-09-25T19:21:56Z

@blaz-r, not to address this one specifically, but @djdameln is working on some changes, which might address this

blaz-r · 2024-09-25T19:48:26Z

Okay. I'll use the workaround for now and we'll see when those changes are added.

blaz-r mentioned this issue Aug 18, 2024

Add ensembling methods for tiling to Anomalib #1226

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Transformations behaviour possibly not expected when predicting with fresh dataloader #2254

[Bug]: Transformations behaviour possibly not expected when predicting with fresh dataloader #2254

blaz-r commented Aug 18, 2024 •

edited

Loading

samet-akcay commented Aug 20, 2024

blaz-r commented Sep 25, 2024

samet-akcay commented Sep 25, 2024

blaz-r commented Sep 25, 2024

[Bug]: Transformations behaviour possibly not expected when predicting with fresh dataloader #2254

[Bug]: Transformations behaviour possibly not expected when predicting with fresh dataloader #2254

Comments

blaz-r commented Aug 18, 2024 • edited Loading

Describe the bug

Dataset

Model

Steps to reproduce the behavior

OS information

Expected behavior

Screenshots

Pip/GitHub

What version/branch did you use?

Configuration YAML

Logs

Code of Conduct

samet-akcay commented Aug 20, 2024

blaz-r commented Sep 25, 2024

samet-akcay commented Sep 25, 2024

blaz-r commented Sep 25, 2024

blaz-r commented Aug 18, 2024 •

edited

Loading