-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initialize Array::m_executeOnGPU #1434
Conversation
src/axom/core/Array.hpp
Outdated
int m_allocator_id; | ||
bool m_executeOnGPU; | ||
int m_allocator_id = INVALID_ALLOCATOR_ID; | ||
bool m_executeOnGPU {false}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we use different variable initialization syntax? Do we have a convention that we follow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not aware of a convention. Point taken though. This file is using assignment-style initialization for the other members so I'll change to that for the bool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@publixsubfan -- I think in the past, you had a reason to prefer the curly brace initialization syntax. Do you recall what it was?
Once this is resolved, it might be good to update our coding conventions w/ a preference. At the very least, we should be consistent w/in a single file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @BradWhitlock -- if possible, could you please add a regression test that triggers the behavior you observed?
Please also update the RELEASE_NOTES
w/ the bugfix (and add a new "Unreleased" section if it's not already there).
src/axom/core/Array.hpp
Outdated
@@ -1164,6 +1170,8 @@ Array<T, DIM, SPACE>::Array(Array&& other) noexcept | |||
m_capacity = other.m_capacity; | |||
m_resize_ratio = other.m_resize_ratio; | |||
m_allocator_id = other.m_allocator_id; | |||
m_executeOnGPU = axom::detail::getAllocatorSpace(m_allocator_id) == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: This is a one-liner, but might is somewhat tricky to get right, and perhaps subject to change. Should it be a function here or in axom/core/memory_management?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Kenny, good idea. I'm doing that now. I was running into trouble on CI anyway with MemorySpace::Device only existing when UMPIRE is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for a reproducer, I could not get it to happen with the test I wrote (so I junked it). The fixes do clear up my problems on CUDA though on my MIR branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From CI:
2024-10-01T19:41:44.0228174Z /home/axom/axom/src/axom/core/../../axom/core/Array.hpp:293:28: error: no member named 'Device' in 'axom::MemorySpace'
2024-10-01T19:41:44.0244447Z axom::MemorySpace::Device;
2024-10-01T19:41:44.0245077Z ~~~~~~~~~~~~~~~~~~~^
https://dev.azure.com/healy22/d4445b8f-6e95-49cd-9e5b-8c8ca1ccad31/_apis/build/builds/10807/logs/58
More reason to move this to a function. Device
is only defined when Umpire is available.
See:
axom/src/axom/core/memory_management.hpp
Lines 37 to 47 in 70b3608
enum class MemorySpace | |
{ | |
Dynamic, | |
#ifdef AXOM_USE_UMPIRE | |
Host, | |
Device, | |
Unified, | |
Pinned, | |
Constant | |
#endif | |
}; |
+1 for the team for having CI configs w/o Umpire!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for a reproducer, I could not get it to happen with the test I wrote (so I junked it). The fixes do clear up my problems on CUDA though on my MIR branch.
Probably still good to have test cases that exercise the true
and false
branches
src/axom/core/Array.hpp
Outdated
int m_allocator_id; | ||
bool m_executeOnGPU; | ||
int m_allocator_id = INVALID_ALLOCATOR_ID; | ||
bool m_executeOnGPU {false}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@publixsubfan -- I think in the past, you had a reason to prefer the curly brace initialization syntax. Do you recall what it was?
Once this is resolved, it might be good to update our coding conventions w/ a preference. At the very least, we should be consistent w/in a single file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BradWhitlock looks reasonable. It would be good to think about adding tests to catch cases that may show breakages -- maybe make an issue that we can discuss?
@rhornung67: I think we'd need to test with asan/ubsan enabled in order to reliably capture a problem like this. Alternatively, |
I added a test that approximates what I was doing in my MIR branch. There was some weirdness with RelWithDebInfo but not a crash. The test calls a new function axom::isDeviceAllocator() to exercise it. |
Summary
Lack of Array::m_executeOnGPU initialization played a role in causing host-initialization of device-allocated memory in certain situations. I was not able to create a reproducer on the develop branch but these changes eliminate warnings about uninitialized memory in axom::Array under Valgrind and fix the crashes on my MIR branch when using CUDA-allocated memory.