Skip to content

ahxmeds/autopet-oncology-generalizability

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Towards a Segment Anything Model (SAM) for lesion segmentation in oncological PET/CT images

(This repository is still under construction and the README.md file will be updated)

Table of contents

Introduction

In this work, we test the generalizability of a convolutional neural network, UNet with residual units trained on PET/CT images of one cancer type to other cancer types. We used three oncological PET/CT datasets (provided by the autoPET 2022 challenge) of different cancer types: lymphoma (n=145), lung cancer (n=168), and melanoma (n=188), collected from two institutions. The dataset also contained PET/CT images from healthy control patients (n=513), but those were not used for this work. The dataset is publicly available and can be downloaded via TCIA website from here.

Materials and Methods

Data preprocessing and augmentation

The original CT images and annotations were resampled to the resolution of the original PET images, and CT intensities (in Hounsfield units) were clipped between (-1024, 1024). Both PET (in SUV) and CT intensities were normalized in (0,1). All the images were then resampled to a voxel spacing of 2.0 mm × 2.0 mm × 2.0 mm. During training, randomly cropped patches of sizes 192 × 192 × 192 were extracted with centers on a foreground or a background voxel with probabilities 5/6 and 1/6, respectively. Spatial augmentations like random affine and 3D elastic deformations were applied to the cropped patches. Input to the network was created by combining the PET and CT patches along the channel dimension. The annotation masks contained two labels: 0 for background and 1 for the lesion class.

Hardware and network architecture

All our networks were trained with nn.DataParallel(.) wrapper on a Standard_NC24s_v3 Azure Virtual Machines from Microsoft consisting of 4 NVIDIA GPUs each with a 16 GiB RAM and 24 vCPUs with overall 448 GiB RAM.

A UNet with residual units adapted from the MONAI [1] was used in this work. This network architecture is shown in Figure 1 below and it can be created using the monai.networks.nets.UNet class of MONAI as follows:

from monai.networks.nets import UNet
device = torch.device("cuda:0")
model = UNet(
    spatial_dims=3,
    in_channels=2,
    out_channels=2,
    channels=(16, 32, 64, 128, 256, 512),
    strides=(2, 2, 2, 2, 2),
    num_res_units=2,
    norm=Norm.BATCH,
).to(device)

Loss function, optimizers, scheduler and metrics

The networks were trained using Dice Loss, $L_{Dice}$ adapted from monai.losses.DiceLoss(.) given by the following equation:

where, $p_{ij}$ and $g_{ij}$ are the $j^{\text{th}}$ voxels of the $i^{\text{th}}$ cropped patch of the predicted and ground truth segmentation masks, respectively. $n_b$ and $K$ denote the batch-size and the number of voxels in a patch, respectively. In this work, we set $n_b = 8$ and $K = 192^3$. The loss for an epoch was calculated by averaging $L_{Dice}$ over all the batches. Adam optimize with an initial learning rate of $10^{-4}$ was used to minimize $L_{Dice}$. The learning rate was reduced to zero at the end of 1200 epochs via the Cosine annealing scheduler.

Dice similarity coefficient (DSC) metric, adapted from monai.metrics.DiceMetric(.), was used for evaluation of the overlap between the ground truth and the predicted mask for the lesion class. Inference was performed using a sliding window method with a window size of 192 × 192 × 192 on the test set images.

Experiments

Each of the PET/CT data from three different cancer types were randomly split into training (80%) and test (20%) sets. For each cancer type, the networks were trained to segment the specific single cancer type under 5-fold cross-validation (CV). We evaluated the model on the internal test set of the same cancer type as the training set and then assessed the transferability of the model's lesion segmentation ability on a different cancer type. We further explored different ensembling techniques - Average (Avg), Weighted Average (WtAvg) (with weights equal to the mean DSC on the corresponding validation fold), Majority Voting (Vote), and STAPLE [2] to combine the five models trained in 5-fold CV as a possible route towards improving model generalizability to new cancer types. The details about the 5-fold split for training/validation and testing for the three cancer types can be found in three .csv files containing the metadata here.

Results

A short description of our network performance with respect to mean and median DSC for different training and test set pairs can be found in the Figure and the Table below,

Ensemble models on lymphoma Ensemble models on lung cancer Ensemble models on melanoma
Lymphoma-unet Lung cancer-unet Melanoma-unet

A short description of the results is as follows:

Training data Ensemble type DSC Lymphoma (Test) DSC Lung cancer (Test) DSC Melanoma (Test)
mean median mean median mean median
Lymphoma Average DSC over folds [01234] 0.5541±0.2774 0.6791 0.4021±0.2412 0.4265 0.3686±0.286 0.3194
Average 0.5832±0.2772 0.7196 0.4161±0.2514 0..4473 0.4330±0.3138 0.4255
Weighted Average 0.5838±0.2761 0.7194 0.4161±0.2519 0.4462 0.4337±0.3139 0.4249
Vote 0.5691±0.2787 0.707 0.4031±0.2491 0.419 0.4253±0.3133 0.4282
STAPLE 0.5766±0.2839 0.7057 0.4374±0.253 0.4527 0.4063±0.2933 0.3914
Lung cancer Average DSC over folds [01234] 0.3886±0.2497 0.4234 0.6909±0.2092 0.7339 0.3729±0.2465 0.3783
Average 0.4062±0.2775 0.4765 0.7147±0.2023 0.7626 0.4206±0.2651 0.4754
Weighted Average 0.4063±0.2775 0.4753 0.7148±0.2023 0.763 0.4207±0.2650 0.4768
Vote 0.3992±0.2789 0.4663 0.7134±0.2026 0.7704 0.4248±0.2667 0.476
STAPLE 0.4132±0.2597 0.4583 0.708±0.2062 0.76 0.381±0.2512 0.3887
Melanoma Average DSC over folds [01234] 0.4026±0.2342 0.4516 0.4033±0.2283 0.4237 0.4737±0.2877 0.5186
Average 0.4136±0.2347 0.4419 0.4119±0.2337 0.4419 0.5175±0.2831 0.6038
Weighted Average 0.4118±0.2365 0.4411 0.4119±0.2336 0.4395 0.5191±0.2822 0.6067
Vote 0.3495±0.245 0.3591 0.3823±0.235 0.4173 0.4736±0.2822 0.5283
STAPLE 0.4575±0.2459 0.5192 0.4316±0.2303 0.4612 0.5154±0.2938 0.5887

References

[1] MONAI: Medical Open Network for AI, AI Toolkit for Healthcare Imaging DOI

[2] Simon K. Warfield, Kelly H. Zou, and William M. Wells, Simultaneous Truth and Performance Level Estimation (STAPLE): An Algorithm for the Validation of Image Segmentation, IEEE Trans Med Imaging, 2004.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages