Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NaN values with FP16 TensorRT Inference #116

Open
ovunctuzel-bc opened this issue May 28, 2024 · 6 comments
Open

NaN values with FP16 TensorRT Inference #116

ovunctuzel-bc opened this issue May 28, 2024 · 6 comments

Comments

@ovunctuzel-bc
Copy link

I'm trying to run FP16 inference using TensorRT 8.5.2.2 on a Xavier NX device, and getting NaN or garbage values. Has anyone encountered a similar issue?

  • I'm using B0 and B1 segmentation models (custom trained).
  • The ONNX model works great. Even tried FP16 ONNX inference, works great.
  • TensorRT with FP32 precision works great.
  • I have tried exporting with the python API and trtexec, results are the same.
@bernardrb
Copy link

#113 Facing a similar issue. You may follow the issue on tensorrt linked in my post

@ovunctuzel-bc
Copy link
Author

I think I was able to isolate the issue to the LiteMLA block, which causes large values as a result of a matrix multiplications. The max values are around 2e5 which is larger than the max FP16 value.

Interestingly this does not happen with the provided pretrained models.

@ovunctuzel-bc
Copy link
Author

I was able to resolve the problem by setting the following layer precisions to FP32 using the python tensorrt API

/backbone/stages.2/op_list.1/context_module/main/MatMul
/backbone/stages.2/op_list.1/context_module/main/MatMul_1
/backbone/stages.2/op_list.1/context_module/main/Slice_5
/backbone/stages.2/op_list.1/context_module/main/Slice_4
/backbone/stages.2/op_list.1/context_module/main/Add
/backbone/stages.2/op_list.1/context_module/main/Div

(Repeat for other stages/op_list combinations)

@Sanath1998
Copy link

I'm trying to run FP16 inference using TensorRT 8.5.2.2 on a Xavier NX device, and getting NaN or garbage values. Has anyone encountered a similar issue?

  • I'm using B0 and B1 segmentation models (custom trained).
  • The ONNX model works great. Even tried FP16 ONNX inference, works great.
  • TensorRT with FP32 precision works great.
  • I have tried exporting with the python API and trtexec, results are the same.

Where is the inference script for the thing which u tried? Im here referring to SEG variant.

@ovunctuzel-bc
Copy link
Author

I'm using a propriety script but you can look at NVIDIA's examples. Running TRT models is usually the same regardless of model architecture, as long as the inputs and outputs are set properly.

https://github.com/NVIDIA/object-detection-tensorrt-example/blob/master/SSD_Model/utils/inference.py

@Sanath1998
Copy link

Sanath1998 commented Aug 14, 2024

I'm using a propriety script but you can look at NVIDIA's examples. Running TRT models is usually the same regardless of model architecture, as long as the inputs and outputs are set properly.

https://github.com/NVIDIA/object-detection-tensorrt-example/blob/master/SSD_Model/utils/inference.py

Thanks a lot @ovunctuzel-bc for ur timely reply. Is there any proper semantic segmentation tensorrrt inference script you referred?
The above script redirects to object detection use case. If you could refer me some then it would be very helpful for me. I have currently tried with Semantic variant b2 model. I have even converted to onnx and trt. Just not getting good efficient resource for trying out tensorrt inference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants