NaN values with FP16 TensorRT Inference #116

ovunctuzel-bc · 2024-05-28T21:28:18Z

I'm trying to run FP16 inference using TensorRT 8.5.2.2 on a Xavier NX device, and getting NaN or garbage values. Has anyone encountered a similar issue?

I'm using B0 and B1 segmentation models (custom trained).
The ONNX model works great. Even tried FP16 ONNX inference, works great.
TensorRT with FP32 precision works great.
I have tried exporting with the python API and trtexec, results are the same.

bernardrb · 2024-05-29T08:32:44Z

#113 Facing a similar issue. You may follow the issue on tensorrt linked in my post

ovunctuzel-bc · 2024-05-29T17:34:31Z

I think I was able to isolate the issue to the LiteMLA block, which causes large values as a result of a matrix multiplications. The max values are around 2e5 which is larger than the max FP16 value.

Interestingly this does not happen with the provided pretrained models.

ovunctuzel-bc · 2024-05-29T23:06:19Z

I was able to resolve the problem by setting the following layer precisions to FP32 using the python tensorrt API

/backbone/stages.2/op_list.1/context_module/main/MatMul
/backbone/stages.2/op_list.1/context_module/main/MatMul_1
/backbone/stages.2/op_list.1/context_module/main/Slice_5
/backbone/stages.2/op_list.1/context_module/main/Slice_4
/backbone/stages.2/op_list.1/context_module/main/Add
/backbone/stages.2/op_list.1/context_module/main/Div

(Repeat for other stages/op_list combinations)

Sanath1998 · 2024-08-14T11:11:11Z

I'm trying to run FP16 inference using TensorRT 8.5.2.2 on a Xavier NX device, and getting NaN or garbage values. Has anyone encountered a similar issue?

I'm using B0 and B1 segmentation models (custom trained).

The ONNX model works great. Even tried FP16 ONNX inference, works great.

TensorRT with FP32 precision works great.

I have tried exporting with the python API and trtexec, results are the same.

Where is the inference script for the thing which u tried? Im here referring to SEG variant.

ovunctuzel-bc · 2024-08-14T18:06:21Z

I'm using a propriety script but you can look at NVIDIA's examples. Running TRT models is usually the same regardless of model architecture, as long as the inputs and outputs are set properly.

https://github.com/NVIDIA/object-detection-tensorrt-example/blob/master/SSD_Model/utils/inference.py

Sanath1998 · 2024-08-14T18:20:03Z

I'm using a propriety script but you can look at NVIDIA's examples. Running TRT models is usually the same regardless of model architecture, as long as the inputs and outputs are set properly.

https://github.com/NVIDIA/object-detection-tensorrt-example/blob/master/SSD_Model/utils/inference.py

Thanks a lot @ovunctuzel-bc for ur timely reply. Is there any proper semantic segmentation tensorrrt inference script you referred?
The above script redirects to object detection use case. If you could refer me some then it would be very helpful for me. I have currently tried with Semantic variant b2 model. I have even converted to onnx and trt. Just not getting good efficient resource for trying out tensorrt inference.

ovunctuzel-bc mentioned this issue May 29, 2024

Discrepancy between TRT engines for same models - TensorRT Issue cross-reference #113

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NaN values with FP16 TensorRT Inference #116

NaN values with FP16 TensorRT Inference #116

ovunctuzel-bc commented May 28, 2024

bernardrb commented May 29, 2024

ovunctuzel-bc commented May 29, 2024

ovunctuzel-bc commented May 29, 2024

Sanath1998 commented Aug 14, 2024

ovunctuzel-bc commented Aug 14, 2024

Sanath1998 commented Aug 14, 2024 •

edited

Loading

NaN values with FP16 TensorRT Inference #116

NaN values with FP16 TensorRT Inference #116

Comments

ovunctuzel-bc commented May 28, 2024

bernardrb commented May 29, 2024

ovunctuzel-bc commented May 29, 2024

ovunctuzel-bc commented May 29, 2024

Sanath1998 commented Aug 14, 2024

ovunctuzel-bc commented Aug 14, 2024

Sanath1998 commented Aug 14, 2024 • edited Loading

Sanath1998 commented Aug 14, 2024 •

edited

Loading