Skip to content

Profiling

Luis Alonso Murillo Rojas edited this page Dec 12, 2022 · 5 revisions

Using the PyTorch and PyTorch Lightning tools are a good way to find and understand possible bottlenecks in your MART's experiment training and test processes. MART has available the following profilers:

  • Simple profiler.
  • Advanced profiler.
  • PyTorch profiler.

In general, to use one of those profilers, you need to set the profiler as the debug options in your MART command:

python -m mart experiment=COCO_TorchvisionFasterRCNN_Adv debug=profiler

To specify which profiler you want to use, just edit the <MART_REPO_DIR>/mart/configs/debug/profiler.yaml setting the desired profiler's target:

defaults:
  - default.yaml

trainer:
  max_epochs: 1
  profiler:
    _target_: pytorch_lightning.profiler.SimpleProfiler
    # _target_: pytorch_lightning.profiler.AdvancedProfiler
    # _target_: pytorch_lightning.profiler.PyTorchProfiler
    dirpath: ${paths.output_dir}
    filename: profiler_log

Simple profiler

It is used to just profile standard actions. The output that this profiler generated looks like:

--------------------------------------------------------------------------------------------------------------------------------------------------
|  Action                                            |  Mean duration (s)	|  Num calls      	|  Total time (s) 	|  Percentage %  |
--------------------------------------------------------------------------------------------------------------------------------------------------
|  Total                                             |  -              	        |  225            	|  80.033         	|  100 %         |
--------------------------------------------------------------------------------------------------------------------------------------------------
|  run_training_epoch                                |  79.879         	        |  1              	|  79.879         	|  99.808        |
|  run_training_batch                                |  29.716         	        |  2              	|  59.432         	|  74.26         |
|  [LightningModule]LitModular.optimizer_step        |  29.714         	        |  2              	|  59.427         	|  74.254        |
|  [Strategy]SingleDeviceStrategy.backward           |  17.531         	        |  2              	|  35.063         	|  43.811        |
|  [Strategy]SingleDeviceStrategy.training_step      |  12.141         	        |  2              	|  24.282         	|  30.34         |
|  [Strategy]SingleDeviceStrategy.validation_step    |  9.9916         	        |  2              	|  19.983         	|  24.969        |
|  [LightningModule]LitModular.validation_epoch_end  |  0.13221        	        |  1              	|  0.13221        	|  0.1652        |
|  [LightningModule]LitModular.training_epoch_end    |  0.10188        	        |  1              	|  0.10188        	|  0.1273        |
|  [LightningModule]LitModular.configure_optimizers  |  0.014653       	        |  1              	|  0.014653       	|  0.018309      |
|  [LightningModule]LitModular.optimizer_zero_grad   |  0.0041084      	        |  2              	|  0.0082167      	|  0.010267      |
|  [Callback]ModelSummary.on_fit_start               |  0.0042709      	        |  1              	|  0.0042709      	|  0.0053365     |

Advanced profiler

This profilers uses the Python’s cProfile that provides more detailed information about the time of each function call in each of the actions. The following is an extraction of the output for the [Strategy]SingleDeviceStrategy.training_step action:

Profile stats for: [Strategy]SingleDeviceStrategy.training_step
         1218993 function calls (1212169 primitive calls) in 20.457 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2    0.000    0.000   20.457   10.229 strategy.py:327(training_step)
        2    0.000    0.000   20.457   10.229 modular.py:87(training_step)
    594/2    0.045    0.000   20.426   10.213 module.py:1124(_call_impl)
        2    0.000    0.000   20.426   10.213 modular.py:81(forward)
        2    0.000    0.000   20.426   10.213 nn.py:96(forward)
       10    0.000    0.000   20.426    2.043 nn.py:127(forward)
        2    0.000    0.000   20.395   10.198 dual_mode.py:59(forward)
        2    0.000    0.000   20.395   10.198 dual_mode.py:66(forward_dual_mode)
    72/44    0.039    0.001   16.309    0.371 container.py:137(forward)
      182    0.001    0.000   14.673    0.081 conv.py:456(forward)
      182    0.001    0.000   14.671    0.081 conv.py:448(_conv_forward)
      182   13.952    0.077   14.670    0.081 {built-in method torch.conv2d}
        2    0.000    0.000   11.651    5.826 backbone_utils.py:56(forward)
        2    0.026    0.013    8.160    4.080 _utils.py:66(forward)
       32    0.198    0.006    7.742    0.242 resnet.py:143(forward)
        4    0.000    0.000    5.899    1.475 rpn.py:339(forward)
        4    0.012    0.003    5.371    1.343 rpn.py:72(forward)
        2    0.051    0.026    3.470    1.735 feature_pyramid_network.py:172(forward)
        4    0.008    0.002    2.705    0.676 roi_heads.py:727(forward)
        8    0.000    0.000    2.667    0.333 feature_pyramid_network.py:158(get_result_from_layer_blocks)
      746    0.008    0.000    2.025    0.003 traceback.py:193(format_stack)
        4    0.000    0.000    1.443    0.361 poolers.py:306(forward)
        4    0.023    0.006    1.443    0.361 poolers.py:147(_multiscale_roi_align)
       60    0.000    0.000    1.427    0.024 _ops.py:138(__call__)
       16    0.001    0.000    1.370    0.086 roi_align.py:13(roi_align)
      106    0.885    0.008    1.368    0.013 misc.py:53(forward)
       16    1.324    0.083    1.368    0.085 {built-in method torch._ops.torchvision.roi_align}

PyTorch profiler

This profiler will inspect the cost of the different model's operations in the CPU and GPU.

FIT Profiler Report
Profile stats for: records
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                          ProfilerStep*        13.55%       14.901s        66.11%       72.705s       18.176s             4  
                        [pl][profile]run_training_batch         0.00%       2.232ms        33.87%       37.249s       37.249s             1  
[pl][profile][LightningModule]LitModular.optimizer_s...         0.00%      82.000us        33.87%       37.247s       37.247s             1  
                                Optimizer.step#SGD.step        20.36%       22.389s        33.87%       37.247s       37.247s             1  
           [pl][module]mart.nn.nn.SequentialDict: model         0.00%       3.154ms        33.11%       36.412s       12.137s             3  
[pl][module]mart.models.dual_mode.DualModeGeneralize...         0.00%     254.000us        33.10%       36.402s       12.134s             3  
[pl][module]torchvision.models.detection.faster_rcnn...         0.03%      28.232ms        33.10%       36.402s       12.134s             3  
[pl][profile][Strategy]SingleDeviceStrategy.backward...         0.01%      16.120ms        32.50%       35.743s       17.872s             2  
autograd::engine::evaluate_function: ConvolutionBack...         0.29%     321.425ms        26.89%       29.574s     227.495ms           130  
                                   ConvolutionBackward0         0.00%       1.326ms        25.96%       28.550s     219.617ms           130  
                             aten::convolution_backward        25.95%       28.537s        25.96%       28.549s     219.606ms           130  
                                           aten::conv2d         0.00%       1.550ms        25.50%       28.047s     102.735ms           273  
                                      aten::convolution         0.08%      85.462ms        25.50%       28.045s     102.730ms           273  
                                     aten::_convolution         0.00%       4.503ms        25.42%       27.960s     102.417ms           273  
                               aten::mkldnn_convolution        25.41%       27.945s        25.42%       27.955s     102.400ms           273  
[pl][profile][Strategy]SingleDeviceStrategy.validati...         0.00%     203.000us        19.60%       21.556s       10.778s             2  
[pl][module]torchvision.models.detection.backbone_ut...         0.02%      21.056ms        19.08%       20.983s        6.994s             3  
[pl][profile][Strategy]SingleDeviceStrategy.training...         0.00%       1.005ms        13.51%       14.858s       14.858s             1  
[pl][module]torchvision.models._utils.IntermediateLa...         0.04%      49.051ms        13.11%       14.412s        4.804s             3  
[pl][module]torchvision.models.detection.rpn.RegionP...         0.05%      52.628ms        10.12%       11.124s        1.854s             6  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 109.970s
Clone this wiki locally