Thyroidiomics: An Automated Pipeline for Segmentation and Classification of Thyroid Pathologies from Scintigraphy Images
This codebase is related to our submission to EUVIP 2024:
Anonymous authors, Thyroidiomics: An Automated Pipeline for Segmentation and Classification of Thyroid Pathologies from Scintigraphy Images.
Figure 1: Thyroidiomics: the proposed two-step pipeline to classify thyroid pathologies into three classes, namely, MNG, TH and DG. Scenario 1 represents the pipeline dependent on physician's delineated ROIs as input to the classifier, while scenario 2 represents the fully automated pipeline operating on segmentation predicted by ResUNet.
The objective of this study was to develop an automated pipeline that enhances thyroid disease classification using thyroid scintigraphy images, aiming to decrease assessment time and increase diagnostic accuracy. Anterior thyroid scintigraphy images from 2,643 patients from nine centers were collected and categorized into multinodal goiter (MNG), thyroiditis (TH), and diffuse goiter (DG), based on clinical reports, and then segmented by an expert. A Residual UNet (ResUNet) model was trained to perform auto-segmentation [1]. Radiomics features were extracted from both physician's (scenario 1) and ResUNet segmentations (scenario 2), followed by omitting highly correlated features using Spearman's correlation, and feature selection using Recursive Feature Elimination with eXtreme Gradient Boosting (XGBoost) as the core [2]. All models were trained under leave-one-center-out cross-validation (LOCOCV) scheme, where nine instances of algorithms was iteratively trained and validated on data from eight centers and tested on the ninth for both scenarios separately.
Figure 2: (Left) (a) Distribution of center-level mean DSC over 9 centers for the classes, MNG, TH and DG. (b)-(d), (e)-(g), and (h)-(j) show some representative images from each class with the ground truth (red) and ResUNet predicted (yellow) segmentation of thyroid. The DSC between ground truth and predicted masks is shown in the bottom-right of each figure. (Right) Various class-wise metrics for classification were used to evaluate model performance in two scenarios: features extracted from the physician's delineated ROIs and those from ResUNet predicted ROIs. The boxplots show the distribution of metrics over the nine centers as test sets for the three thyroid pathology classes, MNG, TH and DG.
Follow the intructions given below to set up the necessary conda environment, install packages, preprocess dataset in the correct format so it can be accepted as inputs by the code, train model and perform anomaly detection on test set using the trained models.
-
Clone the repository, create conda environment and install necessary packages. The first step is to clone this GitHub codebase in your local machine, create a conda environment, and install all the necessary packages. This code base was developed primarily using python=3.9.19, pandas=2.2.2, numpy=1.26.4, SimpleITK=2.3.1, PyTorch=1.11.0, monai=1.3.1, pyradiomics=3.1.0 and CUDA 11.4 on a Microsoft Azure virtual machine with Ubuntu 20.04, so the codebase has been tested only with these configurations. The virtual machine had one GPU with 16 GiB of RAM and 6 vCPUs with 112 GB of RAM. We hope this codebase will run in other suitable combinations of different versions of these libraries, but we cannot guarantee that. Proceed with caution and feel free to modify wherever necessary!
git clone 'https://github.com/igcondapet/thyroidiomics.git' cd thyroidiomics conda env create --file environment.yml
The last step above creates a conda environment named
thyroidiomics_env
. Make sure you have conda installed. Next, activate the conda environmentconda activate thyroidiomics_env
-
Define dataset location and create datainfo.csv file. Go to config.py and set path to data folders for your (possibly multi-institutional) scintigraphy datasets.
THYROIDIOMICS_FOLDER = '' # path to the directory containing `data` and `results` (this will be created by the pipeline) folders. DATA_FOLDER = os.path.join(THYROIDIOMICS_FOLDER, 'data', 'nifti') # place your data in this location
The directory structure within
THYROIDIOMICS_FOLDER
should be as shown below. The foldersimages
andlabels
underTHYROIDIOMICS_FOLDER/data/nifti
must contain all the 2D scintigraphy images and ground truth segmentation labels from all the multi-institutional datasets in.nii.gz
format with each image-label pair given the same filenames. Other folders containing the results of segmentation (THYROIDIOMICS_FOLDER/segmentation_results
) and classification (THYROIDIOMICS_FOLDER/classification_results
) steps will be created in the subsequent steps below.└───THYROIDIOMICS_FOLDER/ ├──data/nifti/ │ ├── images │ │ ├── Patient0001.nii.gz │ │ ├── Patient0002.nii.gz │ │ ├── ... │ ├── labels │ │ ├── Patient0001.nii.gz │ │ ├── Patient0002.nii.gz │ │ ├── ... ├──segmentation_results └──classification_results
Next, create a file named
datainfo.csv
containing information aboutPatientID
(corresponding to image filenames),CenterID
andClass
as shown below. For this work, we had 3 classes corresponding to 3 thyroid pathologies: MNG (label=0), TH (label=1) and DG (label=2).PatientID,CenterID,Class Patient0001,A,0 Patient0002,B,1 Patient0003,D,2 Patient0004,C,2 Patient0005,I,1 ... ... ...
Place the
datainfo.csv
file in this location:thyroidiomics/data_analysis/datainfo.csv
. -
Run segmentation training. The file ./segmentation/train.py runs training on the 2D dataset via PyTorch's
DistributedDataParallel
. To run training, do the following (an example bash script is given in ./segmentation/train.sh).cd thyroidiomics/segmentation CUDA_VISIBLE_DEVICES=0 torchrun --standalone --nproc_per_node=1 train.py --network-name='unet1' --leave-one-center-out='A' --epochs=300 --input-patch-size=64 --inference-patch-size=128 --train-bs=32 --num_workers=4 --lr=2e-4 --wd=1e-5 --val-interval=2 --sw-bs=4 --cache-rate=1
A unique experimentID will be created using the
network_name
andleave_one_center_out
. For example, if you usedunet1
and choose leave-one-center-out (loco) center asA
for testing, this experiment will be referenced asunet1_locoA
under the results folders. Set--nproc_per_node
as the number of GPU nodes available for parallel training. The data is cached using MONAI'sCacheDataset
, so if you are running out of memory, consider lowering the value ofcache_rate
. During training, the training loss and validation DSC are saved underTHYROIDIOMICS_FOLDER/segmentation_results/logs/trainlog_gpu{rank}.csv
andTHYROIDIOMICS_FOLDER/segmentation_results/logs/validlog_gpu{rank}.csv
where{rank}
is the GPU rank and updated every epoch. The checkpoints are saved everyval_interval
epochs underTHYROIDIOMICS_FOLDER/segmentation_results/models/model_ep{epoch_number}.pth
. There are many networks defined under theget_model()
method in the file ./segmentation/initialize_train.py, but in this work, we used the networkunet1
as that had the best performance. -
Run segmentation evaluation on test set. After the training is finished (for a given experimentID), ./segmentation/predict.py can be used to run evaluation on the test set (which consists of all the data left out in LOCOCV scheme, here
CenterID=A
from the above example) and save the predictions, testmetrics and visualization of predicted results. To run test evaluation, do the following (an example bash script is given in ./segmentation/predict.sh).cd thyroidiomics/segmentation python predict.py --network-name='unet1' --leave-one-center-out='A' --inference-patch-size=128 --num_workers=2 --sw-bs=2 --val-interval=2
./segmentation/predict.py uses the model with the highest DSC on the validation set for test evaluation. CAUTION: set
--val-interval
to the same value that was used during training. -
Run classification training and evaluation. In this step, we will use ./classification/classification.py file to train a classfication model, and perform testing in two ways. The training will run on all the centers except the one defined by
--leave-one-center-out
, while testing will be performed only on the center defined by--leave-one-center-out
. In scenario 1 of testing, we extract features from the test images using the physician's thyroid annotation, while in scenario 2, we extract them from the segmentation model's predicted annotations (see Fig. 1 above). To run classification training and evaluation, do the following (an example bash script is given in ./classification/classification_train_predict.sh). Remember, these experiments are also referenced using the same experimentID as for the segmentation step and the results are save accordingly.cd thyroidiomics/classification python classification.py --network-name='unet1' --leave-one-center-out='A'
-
Saved results from segmenation and classification. The segmentation training logs, trained models, predictions, testmetrics and segmentation visualization are stored under folders
logs
,models
,predictions
,testmetrics
andvisualization
created under the locationTHYROIDIOMICS_FOLDER/segmentation_results
referenced by their unique experimentIDs. For the classification step, the extracted features and predicted metrics are stored under the foldersfeature_extraction
andprediction_and_metrics
created under the locationTHYROIDIOMICS_FOLDER/classification_results
again referenced by the same experimentIDs as the previous step.
[1] Ahamed, S., et al., "Comprehensive Evaluation and Insights into the Use of Deep Neural Networks to Detect and Quantify Lymphoma Lesions in PET/CT Images", arXiv:2311.09614 (2023).
[2] Sabouri, M., et al., "Myocardial Perfusion SPECT Imaging Radiomic Features and Machine Learning Algorithms for Cardiac Contractile Pattern Recognition", Journal of Digital Imaging, v36, p497-509 (2022).