Skip to content

Latest commit

 

History

History
141 lines (120 loc) · 10.7 KB

results_format.md

File metadata and controls

141 lines (120 loc) · 10.7 KB

Results folder format

In this work, results include: trained models, training and validation logs, predicted masks, metrics on the test set, etc. These will all be written to a folder called results as defined in the variable RESULTS_FOLDER in the config.py (./../config.py) file. This folder will be next to the data folder, as explained in dataset_format.md.

Results folder/filenaming convention

logs and models folders

While a model is training (see trainddp.md for details), the following two folders will be created within results folder: logs and models and the directory structure may look like this:

└───lymphoma.segmentation/
    ├── data
    └── results
        ├── logs
        │    ├── fold0
        │    │   └── unet
        │    │       └── unet_fold0_rancrop192
        │    │           ├── trainlog_gpu0.csv
        │    │           ├── trainlog_gpu1.csv
        │    │           ├── validlog_gpu0.csv
        │    │           └── validlog_gpu1.csv
        │    └── fold1
        │        └── unet
        │            └── unet_fold1_rancrop192
        │                ├── trainlog_gpu0.csv
        │                ├── trainlog_gpu1.csv
        │                ├── validlog_gpu0.csv
        │                └── validlog_gpu1.csv
        ├── models
        │    ├── fold0
        │    │   └── unet
        │    │       └── unet_fold0_rancrop192
        │    │           ├── model_ep=0002.csv
        │    │           ├── model_ep=0004.csv
        │    │           ├── model_ep=0006.csv
        │    │           ├── model_ep=0008.csv
        │    │           ├── ...
        │    └── fold1
        │        └── unet
        │            └── unet_fold1_rancrop192
        │                ├── model_ep=0002.csv
        │                ├── model_ep=0004.csv
        │                ├── model_ep=0006.csv
        │                ├── model_ep=0008.csv
        │                ├── ...
        ├── ...  

This directory stucture shows that so far, the model unet has been (or is being) trained on two folds: fold0 and fold1. Within the logs or models folder, the directory structure is {logs_or_models}/fold{fold}/{network_name}/{experiment_code}, where the experiment_code is defined as {network_name}_fold{fold}_randcrop{input_patch_size}. The above directory structure shows that for both folds fold0 and fold1, the experiment_code is {unet}_fold{0 or 1}_randcrop{192}, meaning we trained/are training unet for fold 0 or 1 with an input_patch_size = 192. If you train other networks (like segresnet, dynunet, or swinunetr as was the case in this work), they will appear accordingly within the framework of the above directory structure.

Since the training in this work was carried out using the PyTorch's torch.nn.parallel.DistributedDataParallel, the trainlog_gpu0.csv, trainlog_gpu1.csv, validlog_gpu0.csv, validlog_gpu1.csv store the training and validation logs on accumulated on GPU with deviceids 0 and 1. All the validlog_gpu[i].csv are identical and hence redundant so you can use any one of them analysis (we will resolve this to save only one file, in the later versions). All the trainlog_gpu[i].csv are NOT identical, hence each file separately stores the loss accumulated using the distributed data on two GPUs. In our work, we used 4 GPUs, but the above directory structure only shows training on 2 GPUs for the purpose of illustration. The typical trainlog_gpu[i].csv file looks like this:

Loss
0.6536665889951918
0.6449973914358351
0.6385666595564948
0.6357755064964294
...

where each line shows the mean DiceLoss on the training inputs (averaged over all batches) at epoch j+1 with j in the range np.arange(0, epochs); epochs is the total number of epochs for which we are running the training. Similarly, a typical validlog_gpu[i].csv file looks like this:

Metric
0.0011193332029506564
0.001015653251670301
...

where each line shows the mean DiceMetric on the validation inputs at epoch j with j in the range np.arange(2, epochs+1, val_interval), epochs is the total number of epochs for which we are running the training and val_interval (default=2) is the epoch interval at which we are running validation, computing Dice metric and saving the trained model. The variables val_internal, epochs, etc. can be set in train.sh script which is used for running the training.

The saved models are saved in the similar way under the correspding /fold/network/experiment_code folder with filenames model_ep=0002.pth, model_ep=0004.pth, etc. In this case, val_interval = 2 (for example), so the models are saved at interval of 2 starting from the second epoch.

predictions and test_metrics folders

After the trained models are used for predicting the segmentation masks on test images (see inference.md for details), based on the fold, network_name and experiment_code, the predicted masks will be written to LYMPHOMA_SEGMENTATION_FOLDER/results/predictions/fold{fold}/{network_name}/{experiment_code}. Once the predicted masks have been generated and saved, the metrics computed on the test set using the test ground truth and predicted masks will be written to LYMPHOMA_SEGMENTATION_FOLDER/results/test_metrics/fold{fold}/{network_name}/{experiment_code}/testmetrics.csv. We compute three segmentation metrics: Dice similarity coefficient (DSC), false positive volume (FPV) in ml, false negative volume (FNV) in ml. We also compute detection metrics such as true positive (TP), false positive (FP), and false negative (FN) lesion detections via three different criterion labeled as Criterion1, Criterion2, and Criterion3. These metrics have been defined in metrics/metrics.py. After running inference and calculating the test metrics, the (relevant) directory structure may look like:

└───lymphoma.segmentation/
        ├── data
        └── results
            ├── logs
            ├── models
            ├── predictions
            │   ├── fold0
            │   │   └── unet
            │   │       └── unet_fold0_randcrop192
            │   │           ├── Patient0003_20190402.nii.gz
            │   │           ├── Patient0004_20160204.nii.gz 
            │   │           ├── ...
            │   └── fold1
            │       └── unet
            │           └── unet_fold1_randcrop192
            │               ├── Patient0003_20190402.nii.gz
            │               ├── Patient0004_20160204.nii.gz 
            │               ├── ...
            │
            └── test_metrics
                ├── fold0
                │   └── unet
                │       └── unet_fold0_randcrop192
                │           └── testmetrics.csv   
                └── fold1
                    └── unet
                        └── unet_fold1_randcrop192
                            └── testmetrics.csv

The predicted masks are in the same geometry (same size, spacing, origin, direction) as their corresponding ground truth masks. A typical testmetrics.csv file looks like:

PatientID DSC FPV FNV TP_C1 FP_C1 FN_C1 TP_C2 FP_C2 FN_C2 TP_C3 FP_C3 FN_C3
Patient0003_20190402 0.7221043699618158 17.5164623503173 1.173559512304143 3 6 2 2 7 3 3 6 2
Patient0004_20160204 0.0807955251709131 53.4186903933997 5.563541391664086 2 8 1 0 10 3 2 8 1

Here, all the metrics are at the patient level and FPV and FNV are expressed in ml.

test_lesion_measures folder

In this work, we have performed further analyses on the predicted segmentation masks on the test set and compared them to the ground truth masks. These include comparing the patient-level lesion SUVmean, lesion SUVmax, number of lesions, total metabolic tumor volume (TMTV) in ml, total lesion glycolysis (TLG) in ml, lesion dissemination (Dmax) in cm. These metrics have been defined in metrics/metrics.py. The test set predicted lesion measures are written to LYMPHOMA_SEGMENTATION_FOLDER/results/test_lesion_measures/fold{fold}/{network_name}/{experiment_code}/testlesionmeasures.csv. After generating testlesionmeasures.csv files, the relevant directory structure may look like:

└───lymphoma.segmentation/
        ├── data
        └── results
            ├── logs
            ├── models
            ├── predictions
            ├── test_metrics
            └── test_lesion_measures
                ├── fold0
                │   └── unet
                │       └── unet_fold0_randcrop192
                │           └── testlesionmeasures.csv   
                └── fold1
                    └── unet
                        └── unet_fold1_randcrop192
                            └── testlesionmeasures.csv

A typical testlesionmeasures.csv file looks like:

PatientID DSC SUVmean_orig SUVmean_pred SUVmax_orig SUVmax_pred LesionCount_orig LesionCount_pred TMTV_orig TMTV_pred TLG_orig TLG_pred Dmax_orig Dmax_pred
Patient0003_20190402 0.7221043699618158 2.935304139385291 4.362726242681123 6.1822732035904515 7.827266273892102 3 4 13.691527643548337 18.6272625128359097 40.18879776661558 50.2728492927217289 15.837606584884108 25.82763813918739
Patient0004_20160204 0.0807955251709131 8.72882540822585 12.71524350987 40.294842200490244 45.9483628492382 9 6 20.732884717373196 16.756373846353748 180.9737309068245 120.2387139879348 14.737477375372881 7.652628627281008

Here, all the lesion measures are at the patient level. TMTV and TLG are expressed in ml and Dmax in cm.