Fix an uninitialized variable in GeographicLib. #1173

bcoconni · 2024-10-13T10:48:25Z

Fixes the issue #475.

Unlike the option that has been chosen by the GeographicLib project (see geographiclib/geographiclib#33 (comment)), the variable m12x is set to zero, instead of Math::NaN, to avoid raising floating point exceptions.

A new floating point exception flag FE_OVERFLOW has also been added to JSBSim.cpp because it is the exception that is triggered by this bug and JSBSim.cpp was not catching it on Linux. This setting helped debugging the issue while running the script c3101.xml on an Ubuntu distribution.

UPDATE: The variable m12x is initialized to Math::NaN as in the GeographicLib project. See the discussion below for details.

This avoids the tests to fail because an FPE can be randomly triggered.

codecov · 2024-10-13T10:56:47Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 24.96%. Comparing base (0f00b80) to head (f19f079).

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1173      +/-   ##
==========================================
- Coverage   25.12%   24.96%   -0.17%     
==========================================
  Files         170      170              
  Lines       18337    19283     +946     
==========================================
+ Hits         4608     4814     +206     
- Misses      13729    14469     +740

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

seanmcleod · 2024-10-13T19:04:24Z

@bcoconni I'm refreshing my memory regarding the original issue and trying to understand the issue with setting m12x = Math::NaN.

So first off in terms of Python versus JSBSim exe, it looks to me like in both cases for MSC which I'm using to test, that the set of enabled floating point exceptions are the same?

jsbsim/python/fpectl/fpectlmodule.cpp

Lines 142 to 148 in 0f00b80

    
           static PyObject *turnon_sigfpe(PyObject *self, PyObject *args) 
        
           { 
        
           #ifdef _MSC_VER 
        
             _clearfp(); 
        
             fp_flags = _controlfp(_controlfp(0, 0) & ~(_EM_INVALID | _EM_ZERODIVIDE | _EM_OVERFLOW), 
        
                                   _MCW_EM); 
        
             handler = PyOS_setsig(SIGFPE, sigfpe_handler);

jsbsim/src/JSBSim.cpp

Lines 288 to 294 in 0f00b80

    
           int main(int argc, char* argv[]) 
        
           { 
        
           #if defined(_MSC_VER) || defined(__MINGW32__) 
        
             _clearfp(); 
        
             _controlfp(_controlfp(0, 0) & ~(_EM_INVALID | _EM_ZERODIVIDE | _EM_OVERFLOW), 
        
                      _MCW_EM); 
        
           #elif defined(__GNUC__) && !defined(sgi) && !defined(__APPLE__)

Testing a number of scripts I hit a breakpoint here:

jsbsim/src/GeographicLib/Geodesic.cpp

Line 406 in 0f00b80

m12x *= _b;

Originally, using a debug build and without m12x being initialized I was seeing a constant value for it, I'm assuming the compiler was setting it since I saw the same value for different uninitialized doubles in multiple functions, set to 9.2559631349317831e+61. Which didn't cause any floating point exceptions since it's not too big to cause an overflow etc. But big enough and somewhat unique to stand out so that developers would notice it when debugging.

I then initialized m12x to Math:NaN() and checked what happened when I hit the breakpoint. And no floating point exception, the result is just NaN after the multiplication with b.

If any input has a quiet NaN value, it will cause the result of nearly any floating-point operation that consumes it to be NaN without raising the invalid exception.

And checking what Math::NaN() returns:

jsbsim/src/GeographicLib/Math.cpp

Lines 260 to 270 in 0f00b80

    
               template<typename T> T Math::NaN() { 
        
           #if defined(_MSC_VER) 
        
                 return numeric_limits<T>::has_quiet_NaN ? 
        
                   numeric_limits<T>::quiet_NaN() : 
        
                   (numeric_limits<T>::max)(); 
        
           #else 
        
                 return numeric_limits<T>::has_quiet_NaN ? 
        
                   numeric_limits<T>::quiet_NaN() : 
        
                   numeric_limits<T>::max(); 
        
           #endif 
        
               }

So when you say:

variable m12x is set to zero, instead of Math::NaN, to avoid raising floating point exceptions.

Are you seeing examples of Math::NaN() returning numeric_limits<T>::max() on some of the platforms you're testing on and then causing an overflow exception? Or are you just being cautious given that some platforms may not have a numeric_limits<T>::quiet_NaN()?

I'm assuming when we were seeing this intermittent bug it was because m12x was randomly sometimes ending up with a value close to max and then causing an overflow exception.

But if we're confident that all modern (?) compilers/platforms have a quiet NaN then wouldn't it be better to go with Math::NaN() so that we don't have to worry about having to perform an update to the GeoGraphicLib each time we pull in an update of it?

bcoconni · 2024-10-19T11:23:57Z

So first off in terms of Python versus JSBSim exe, it looks to me like in both cases for MSC which I'm using to test, that the set of enabled floating point exceptions are the same?

Yes, correct.

Originally, using a debug build and without m12x being initialized I was seeing a constant value for it, I'm assuming the compiler was setting it since I saw the same value for different uninitialized doubles in multiple functions, set to 9.2559631349317831e+61. Which didn't cause any floating point exceptions since it's not too big to cause an overflow etc. But big enough and somewhat unique to stand out so that developers would notice it when debugging.

In the case of the Ubuntu distribution (which is the one used by the GitHub runners), m12x can sometimes be set to a value which is high enough to trigger an overflow FPE. Of course this condition is met by accident and is dependent on the stack history since m12x is most likely allocated in the stack by C++ compilers. Hence the randomness of the bug.

By chance, this condition was systematically fulfilled at some point of the development of my logging4 branch. This was the moment when the bug became reproducible and when I have finally been able to track the bug down to the lack of initialization of m12x.

Are you seeing examples of Math::NaN() returning numeric_limits<T>::max() on some of the platforms you're testing on and then causing an overflow exception? Or are you just being cautious given that some platforms may not have a numeric_limits<T>::quiet_NaN()?

No, I have assumed that multiplying a NaN by a number would raise an invalid FPE. And you are entirely right: my assumption is wrong and setting m12x to Math::Nan() does not trigger any floating point exception. All the tests that I have run following your feedback have passed and proven your point. I now know what "quiet" means in quiet_NaN 😄

I'm assuming when we were seeing this intermittent bug it was because m12x was randomly sometimes ending up with a value close to max and then causing an overflow exception.

Agreed, I guess that compilers are allocating m12x in the stack so its value depends on the previous content of the stack which is more or less random.

But if we're confident that all modern (?) compilers/platforms have a quiet NaN then wouldn't it be better to go with Math::NaN() so that we don't have to worry about having to perform an update to the GeoGraphicLib each time we pull in an update of it?

Yes, that's a good point. Added to the fact that I was incorrectly assuming that using quiet_Nan in m12x *= b would raise an FPE, I see no longer a reason to initialize m12x to zero.

Fix an uninitialized variable in GeographicLib (issue JSBSim-Team#475).

ceb0a5c

This avoids the tests to fail because an FPE can be randomly triggered.

bcoconni mentioned this pull request Oct 13, 2024

Floating Point Exception in Actions build process #475

Open

3 tasks

bcoconni linked an issue Oct 13, 2024 that may be closed by this pull request

Floating Point Exception in Actions build process #475

Open

3 tasks

Initialize to Math::NaN instead of zero.

f19f079

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix an uninitialized variable in GeographicLib. #1173

Fix an uninitialized variable in GeographicLib. #1173

bcoconni commented Oct 13, 2024 •

edited

Loading

codecov bot commented Oct 13, 2024 •

edited

Loading

seanmcleod commented Oct 13, 2024 •

edited

Loading

bcoconni commented Oct 19, 2024

Fix an uninitialized variable in GeographicLib. #1173

Are you sure you want to change the base?

Fix an uninitialized variable in GeographicLib. #1173

Conversation

bcoconni commented Oct 13, 2024 • edited Loading

codecov bot commented Oct 13, 2024 • edited Loading

Codecov Report

seanmcleod commented Oct 13, 2024 • edited Loading

bcoconni commented Oct 19, 2024

bcoconni commented Oct 13, 2024 •

edited

Loading

codecov bot commented Oct 13, 2024 •

edited

Loading

seanmcleod commented Oct 13, 2024 •

edited

Loading