Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix an uninitialized variable in GeographicLib. #1173

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

bcoconni
Copy link
Member

@bcoconni bcoconni commented Oct 13, 2024

Fixes the issue #475.

Unlike the option that has been chosen by the GeographicLib project (see geographiclib/geographiclib#33 (comment)), the variable m12x is set to zero, instead of Math::NaN, to avoid raising floating point exceptions.

A new floating point exception flag FE_OVERFLOW has also been added to JSBSim.cpp because it is the exception that is triggered by this bug and JSBSim.cpp was not catching it on Linux. This setting helped debugging the issue while running the script c3101.xml on an Ubuntu distribution.

UPDATE: The variable m12x is initialized to Math::NaN as in the GeographicLib project. See the discussion below for details.

This avoids the tests to fail because an FPE can be randomly triggered.
Copy link

codecov bot commented Oct 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 24.96%. Comparing base (0f00b80) to head (f19f079).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1173      +/-   ##
==========================================
- Coverage   25.12%   24.96%   -0.17%     
==========================================
  Files         170      170              
  Lines       18337    19283     +946     
==========================================
+ Hits         4608     4814     +206     
- Misses      13729    14469     +740     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bcoconni bcoconni linked an issue Oct 13, 2024 that may be closed by this pull request
3 tasks
@seanmcleod
Copy link
Member

seanmcleod commented Oct 13, 2024

@bcoconni I'm refreshing my memory regarding the original issue and trying to understand the issue with setting m12x = Math::NaN.

So first off in terms of Python versus JSBSim exe, it looks to me like in both cases for MSC which I'm using to test, that the set of enabled floating point exceptions are the same?

static PyObject *turnon_sigfpe(PyObject *self, PyObject *args)
{
#ifdef _MSC_VER
_clearfp();
fp_flags = _controlfp(_controlfp(0, 0) & ~(_EM_INVALID | _EM_ZERODIVIDE | _EM_OVERFLOW),
_MCW_EM);
handler = PyOS_setsig(SIGFPE, sigfpe_handler);

jsbsim/src/JSBSim.cpp

Lines 288 to 294 in 0f00b80

int main(int argc, char* argv[])
{
#if defined(_MSC_VER) || defined(__MINGW32__)
_clearfp();
_controlfp(_controlfp(0, 0) & ~(_EM_INVALID | _EM_ZERODIVIDE | _EM_OVERFLOW),
_MCW_EM);
#elif defined(__GNUC__) && !defined(sgi) && !defined(__APPLE__)

Testing a number of scripts I hit a breakpoint here:

Originally, using a debug build and without m12x being initialized I was seeing a constant value for it, I'm assuming the compiler was setting it since I saw the same value for different uninitialized doubles in multiple functions, set to 9.2559631349317831e+61. Which didn't cause any floating point exceptions since it's not too big to cause an overflow etc. But big enough and somewhat unique to stand out so that developers would notice it when debugging.

I then initialized m12x to Math:NaN() and checked what happened when I hit the breakpoint. And no floating point exception, the result is just NaN after the multiplication with b.

If any input has a quiet NaN value, it will cause the result of nearly any floating-point operation that consumes it to be NaN without raising the invalid exception.

And checking what Math::NaN() returns:

template<typename T> T Math::NaN() {
#if defined(_MSC_VER)
return numeric_limits<T>::has_quiet_NaN ?
numeric_limits<T>::quiet_NaN() :
(numeric_limits<T>::max)();
#else
return numeric_limits<T>::has_quiet_NaN ?
numeric_limits<T>::quiet_NaN() :
numeric_limits<T>::max();
#endif
}

So when you say:

variable m12x is set to zero, instead of Math::NaN, to avoid raising floating point exceptions.

Are you seeing examples of Math::NaN() returning numeric_limits<T>::max() on some of the platforms you're testing on and then causing an overflow exception? Or are you just being cautious given that some platforms may not have a numeric_limits<T>::quiet_NaN()?

I'm assuming when we were seeing this intermittent bug it was because m12x was randomly sometimes ending up with a value close to max and then causing an overflow exception.

But if we're confident that all modern (?) compilers/platforms have a quiet NaN then wouldn't it be better to go with Math::NaN() so that we don't have to worry about having to perform an update to the GeoGraphicLib each time we pull in an update of it?

@bcoconni
Copy link
Member Author

So first off in terms of Python versus JSBSim exe, it looks to me like in both cases for MSC which I'm using to test, that the set of enabled floating point exceptions are the same?

Yes, correct.

Originally, using a debug build and without m12x being initialized I was seeing a constant value for it, I'm assuming the compiler was setting it since I saw the same value for different uninitialized doubles in multiple functions, set to 9.2559631349317831e+61. Which didn't cause any floating point exceptions since it's not too big to cause an overflow etc. But big enough and somewhat unique to stand out so that developers would notice it when debugging.

In the case of the Ubuntu distribution (which is the one used by the GitHub runners), m12x can sometimes be set to a value which is high enough to trigger an overflow FPE. Of course this condition is met by accident and is dependent on the stack history since m12x is most likely allocated in the stack by C++ compilers. Hence the randomness of the bug.

By chance, this condition was systematically fulfilled at some point of the development of my logging4 branch. This was the moment when the bug became reproducible and when I have finally been able to track the bug down to the lack of initialization of m12x.

Are you seeing examples of Math::NaN() returning numeric_limits<T>::max() on some of the platforms you're testing on and then causing an overflow exception? Or are you just being cautious given that some platforms may not have a numeric_limits<T>::quiet_NaN()?

No, I have assumed that multiplying a NaN by a number would raise an invalid FPE. And you are entirely right: my assumption is wrong and setting m12x to Math::Nan() does not trigger any floating point exception. All the tests that I have run following your feedback have passed and proven your point. I now know what "quiet" means in quiet_NaN 😄

I'm assuming when we were seeing this intermittent bug it was because m12x was randomly sometimes ending up with a value close to max and then causing an overflow exception.

Agreed, I guess that compilers are allocating m12x in the stack so its value depends on the previous content of the stack which is more or less random.

But if we're confident that all modern (?) compilers/platforms have a quiet NaN then wouldn't it be better to go with Math::NaN() so that we don't have to worry about having to perform an update to the GeoGraphicLib each time we pull in an update of it?

Yes, that's a good point. Added to the fact that I was incorrectly assuming that using quiet_Nan in m12x *= b would raise an FPE, I see no longer a reason to initialize m12x to zero.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Floating Point Exception in Actions build process
2 participants