-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NaN values with FP16 TensorRT Inference #116
Comments
#113 Facing a similar issue. You may follow the issue on tensorrt linked in my post |
I think I was able to isolate the issue to the LiteMLA block, which causes large values as a result of a matrix multiplications. The max values are around 2e5 which is larger than the max FP16 value. Interestingly this does not happen with the provided pretrained models. |
I was able to resolve the problem by setting the following layer precisions to FP32 using the python tensorrt API
(Repeat for other stages/op_list combinations) |
Where is the inference script for the thing which u tried? Im here referring to SEG variant. |
I'm using a propriety script but you can look at NVIDIA's examples. Running TRT models is usually the same regardless of model architecture, as long as the inputs and outputs are set properly. https://github.com/NVIDIA/object-detection-tensorrt-example/blob/master/SSD_Model/utils/inference.py |
Thanks a lot @ovunctuzel-bc for ur timely reply. Is there any proper semantic segmentation tensorrrt inference script you referred? |
I'm trying to run FP16 inference using TensorRT 8.5.2.2 on a Xavier NX device, and getting NaN or garbage values. Has anyone encountered a similar issue?
The text was updated successfully, but these errors were encountered: