Performance profiling of MONAI Core training by NVIDIA Nsight Systems on HiperGator (work in progress)
MONAI Core tutorial repo hosts training pipeline profiling tutorials:
- In Fast Model Training Guide, see section 2. NVIDIA Nsight Systems and section 3. NVIDIA Tools Extension (NVTX).
- Profile a radiology pipeline for spleen segmentation.
- Profile a pathology pipeline for metastasis detection.
- ...
To learn more about NVIDIA Nsight Systems and NVTX, refer to:
- NVIDIA Nsight Systems documentation. Highly recommend to check out previous Training Seminars.
- NVTX documentation.
This README.md shows how to first profile the above mentioned tutorial radiology pipeline within a MONAI Core Singularity container on HiperGator by Nsight Systems CLI and then visualize the generated report in Nsight Systems GUI installed on your local system. The CLI is already installed in the MONAI Core container, so you don't need to install it manually. To install Nsight Systems GUI on your local system, please refer to the Installation Guide.
- This tutorial assumes you have downloaded the repository
monai_uf_tutorials
following this section. - If you have no experience running MONAI Core using Singularity as container runtime on HiperGator before, I strongly recommend going through tutorial
monaicore_singlegpu
and making sure it's working before moving on to this tutorial. - If you have no experience running distributed training with MONAI Core on HiperGator before, I strongly recommend going through the unet_ddp example in tutorial
monaicore_multigpu
and making sure it's working before moving on to this tutorial. - In all following commands, replace
hju
with your HiperGator username; change the path to files according to your own settings on HiperGator. - In all following SLURM job scripts, alter the
#SBATCH
settings for your own needs. - Please read the comments in each script to get a better understanding on how to tune the scripts to your own needs.
-
Go to directory
/monai_uf_tutorials/profile/
cd ~/monai_uf_tutorials/profile
-
To profile the tutorial radiology pipeline,
sbatch nsys_radiology.sh
You should see a
.nsys-rep
report file generated, see sample report fileoutput_base.nsys-rep
. Also, see sample SLURM outputnsys_radiology.sh.job_id.out
. -
Transfer the
.nsys-rep
report file back to your local system, see UFRC doc on Trasfer Data for all available methods. E.g., we can usescp
,scp hju@hpg.rc.ufl.edu:/home/hju/monai_uf_tutorials/profile/output_base.nsys-rep .
-
On your local system, launch Nsight Systems GUI, open the
.nsys-rep
report file.