-
Notifications
You must be signed in to change notification settings - Fork 0
Profiling
Using the PyTorch
and PyTorch Lightning
tools are a good way to find and understand possible bottlenecks in your MART's experiment training and test processes. MART has available the following profilers:
- Simple profiler.
- Advanced profiler.
- PyTorch profiler.
In general, to use one of those profilers, you need to set the profiler as the debug options in your MART command:
python -m mart experiment=COCO_TorchvisionFasterRCNN_Adv debug=profiler
To specify which profiler you want to use, just edit the <MART_REPO_DIR>/mart/configs/debug/profiler.yaml
setting the desired profiler's target:
defaults:
- default.yaml
trainer:
max_epochs: 1
profiler:
_target_: pytorch_lightning.profiler.SimpleProfiler
# _target_: pytorch_lightning.profiler.AdvancedProfiler
# _target_: pytorch_lightning.profiler.PyTorchProfiler
dirpath: ${paths.output_dir}
filename: profiler_log
It is used to just profile standard actions. The output that this profiler generated looks like:
--------------------------------------------------------------------------------------------------------------------------------------------------
| Action | Mean duration (s) | Num calls | Total time (s) | Percentage % |
--------------------------------------------------------------------------------------------------------------------------------------------------
| Total | - | 225 | 80.033 | 100 % |
--------------------------------------------------------------------------------------------------------------------------------------------------
| run_training_epoch | 79.879 | 1 | 79.879 | 99.808 |
| run_training_batch | 29.716 | 2 | 59.432 | 74.26 |
| [LightningModule]LitModular.optimizer_step | 29.714 | 2 | 59.427 | 74.254 |
| [Strategy]SingleDeviceStrategy.backward | 17.531 | 2 | 35.063 | 43.811 |
| [Strategy]SingleDeviceStrategy.training_step | 12.141 | 2 | 24.282 | 30.34 |
| [Strategy]SingleDeviceStrategy.validation_step | 9.9916 | 2 | 19.983 | 24.969 |
| [LightningModule]LitModular.validation_epoch_end | 0.13221 | 1 | 0.13221 | 0.1652 |
| [LightningModule]LitModular.training_epoch_end | 0.10188 | 1 | 0.10188 | 0.1273 |
| [LightningModule]LitModular.configure_optimizers | 0.014653 | 1 | 0.014653 | 0.018309 |
| [LightningModule]LitModular.optimizer_zero_grad | 0.0041084 | 2 | 0.0082167 | 0.010267 |
| [Callback]ModelSummary.on_fit_start | 0.0042709 | 1 | 0.0042709 | 0.0053365 |
This profilers uses the Python’s cProfile that provides more detailed information about the time of each function call in each of the actions. The following is an extraction of the output for the [Strategy]SingleDeviceStrategy.training_step
action:
Profile stats for: [Strategy]SingleDeviceStrategy.training_step
1218993 function calls (1212169 primitive calls) in 20.457 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
2 0.000 0.000 20.457 10.229 strategy.py:327(training_step)
2 0.000 0.000 20.457 10.229 modular.py:87(training_step)
594/2 0.045 0.000 20.426 10.213 module.py:1124(_call_impl)
2 0.000 0.000 20.426 10.213 modular.py:81(forward)
2 0.000 0.000 20.426 10.213 nn.py:96(forward)
10 0.000 0.000 20.426 2.043 nn.py:127(forward)
2 0.000 0.000 20.395 10.198 dual_mode.py:59(forward)
2 0.000 0.000 20.395 10.198 dual_mode.py:66(forward_dual_mode)
72/44 0.039 0.001 16.309 0.371 container.py:137(forward)
182 0.001 0.000 14.673 0.081 conv.py:456(forward)
182 0.001 0.000 14.671 0.081 conv.py:448(_conv_forward)
182 13.952 0.077 14.670 0.081 {built-in method torch.conv2d}
2 0.000 0.000 11.651 5.826 backbone_utils.py:56(forward)
2 0.026 0.013 8.160 4.080 _utils.py:66(forward)
32 0.198 0.006 7.742 0.242 resnet.py:143(forward)
4 0.000 0.000 5.899 1.475 rpn.py:339(forward)
4 0.012 0.003 5.371 1.343 rpn.py:72(forward)
2 0.051 0.026 3.470 1.735 feature_pyramid_network.py:172(forward)
4 0.008 0.002 2.705 0.676 roi_heads.py:727(forward)
8 0.000 0.000 2.667 0.333 feature_pyramid_network.py:158(get_result_from_layer_blocks)
746 0.008 0.000 2.025 0.003 traceback.py:193(format_stack)
4 0.000 0.000 1.443 0.361 poolers.py:306(forward)
4 0.023 0.006 1.443 0.361 poolers.py:147(_multiscale_roi_align)
60 0.000 0.000 1.427 0.024 _ops.py:138(__call__)
16 0.001 0.000 1.370 0.086 roi_align.py:13(roi_align)
106 0.885 0.008 1.368 0.013 misc.py:53(forward)
16 1.324 0.083 1.368 0.085 {built-in method torch._ops.torchvision.roi_align}
This profiler will inspect the cost of the different model's operations in the CPU and GPU.
FIT Profiler Report
Profile stats for: records
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
ProfilerStep* 13.55% 14.901s 66.11% 72.705s 18.176s 4
[pl][profile]run_training_batch 0.00% 2.232ms 33.87% 37.249s 37.249s 1
[pl][profile][LightningModule]LitModular.optimizer_s... 0.00% 82.000us 33.87% 37.247s 37.247s 1
Optimizer.step#SGD.step 20.36% 22.389s 33.87% 37.247s 37.247s 1
[pl][module]mart.nn.nn.SequentialDict: model 0.00% 3.154ms 33.11% 36.412s 12.137s 3
[pl][module]mart.models.dual_mode.DualModeGeneralize... 0.00% 254.000us 33.10% 36.402s 12.134s 3
[pl][module]torchvision.models.detection.faster_rcnn... 0.03% 28.232ms 33.10% 36.402s 12.134s 3
[pl][profile][Strategy]SingleDeviceStrategy.backward... 0.01% 16.120ms 32.50% 35.743s 17.872s 2
autograd::engine::evaluate_function: ConvolutionBack... 0.29% 321.425ms 26.89% 29.574s 227.495ms 130
ConvolutionBackward0 0.00% 1.326ms 25.96% 28.550s 219.617ms 130
aten::convolution_backward 25.95% 28.537s 25.96% 28.549s 219.606ms 130
aten::conv2d 0.00% 1.550ms 25.50% 28.047s 102.735ms 273
aten::convolution 0.08% 85.462ms 25.50% 28.045s 102.730ms 273
aten::_convolution 0.00% 4.503ms 25.42% 27.960s 102.417ms 273
aten::mkldnn_convolution 25.41% 27.945s 25.42% 27.955s 102.400ms 273
[pl][profile][Strategy]SingleDeviceStrategy.validati... 0.00% 203.000us 19.60% 21.556s 10.778s 2
[pl][module]torchvision.models.detection.backbone_ut... 0.02% 21.056ms 19.08% 20.983s 6.994s 3
[pl][profile][Strategy]SingleDeviceStrategy.training... 0.00% 1.005ms 13.51% 14.858s 14.858s 1
[pl][module]torchvision.models._utils.IntermediateLa... 0.04% 49.051ms 13.11% 14.412s 4.804s 3
[pl][module]torchvision.models.detection.rpn.RegionP... 0.05% 52.628ms 10.12% 11.124s 1.854s 6
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
Self CPU time total: 109.970s