Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Reducer ValOp Interface #1732

Merged
merged 62 commits into from
Oct 8, 2024
Merged
Show file tree
Hide file tree
Changes from 53 commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
3c58b7c
Add ValOp and update new reduction objects.
rchen20 Sep 6, 2024
04acb5a
Update CUDA grid_reduce for ValOp.
rchen20 Sep 6, 2024
36c3987
Update ValOp CUDA backend.
rchen20 Sep 6, 2024
c68119a
Update HIP grid_reduce for ValOp.
rchen20 Sep 6, 2024
424aedd
Update ValOp HIP backend.
rchen20 Sep 6, 2024
41ce7fa
Update internal device allocations for ValOp.
rchen20 Sep 6, 2024
28e2611
Update ValOp OpenMP backend.
rchen20 Sep 6, 2024
d8e32bf
Update ValOp Sequential backend.
rchen20 Sep 6, 2024
00324e3
Update new reduction example.
rchen20 Sep 6, 2024
d56371c
Merge branch 'develop' into task/chen59/valopinterface
rchen20 Sep 6, 2024
bd5ec5f
Update SYCL backend for ValOp.
rchen20 Sep 6, 2024
048cb15
Update example for SYCL ValOp.
rchen20 Sep 6, 2024
79fcd53
Minor corrections for SYCL build script.
rchen20 Sep 6, 2024
aec4372
Update OpenMP Target backend for ValOp.
rchen20 Sep 6, 2024
2a66423
Update example for OpenMP Target.
rchen20 Sep 6, 2024
12d5932
Add a set function to ValOp.
rchen20 Sep 6, 2024
b4797fc
Update ReduceSum test.
rchen20 Sep 6, 2024
d27c5a1
Some setter and getter functions. May change later.
rchen20 Sep 9, 2024
0b74740
Update new reducer forall tests.
rchen20 Sep 9, 2024
298c70f
Update new reducer launch tests.
rchen20 Sep 9, 2024
38439bf
Update launch example with ValOp.
rchen20 Sep 9, 2024
9302951
Simplify ValOp interface for user. ValOp only needed within lambdas.
rchen20 Sep 13, 2024
4a69f46
Update backends with simplified ValOp.
rchen20 Sep 13, 2024
69413c6
Update ValOp in tests.
rchen20 Sep 13, 2024
54f2f88
Update ValOp in examples.
rchen20 Sep 13, 2024
a7a740e
Merge branch 'develop' into task/chen59/valopinterface
rchen20 Sep 13, 2024
b6e0e81
Remove need for T() in ValLoc.
rchen20 Sep 14, 2024
ecd4036
Clean up interface and internals of ValOp.
rchen20 Sep 17, 2024
41b531a
Cleanup.
rchen20 Sep 17, 2024
dd45e66
Min and max functions taking a ValLoc. Only allowed within a lambda v…
rchen20 Sep 18, 2024
1b81cb9
Add alternative reduction capability.
rchen20 Sep 19, 2024
81ee963
Tests for alternative reduction capability.
rchen20 Sep 19, 2024
96cb30b
Merge branch 'develop' into task/chen59/valopinterface
rchen20 Sep 19, 2024
1ebe526
Merge branch 'develop' into task/chen59/valopinterface
rchen20 Sep 19, 2024
1e13b59
Fix SYCL typo.
rchen20 Sep 19, 2024
a414e12
Update various constructors in Val.
rchen20 Sep 19, 2024
6cd997a
Update examples with ValOp.
rchen20 Sep 19, 2024
8062809
Attempt to make OpenMP ValOp reduction tests faster in CI.
rchen20 Sep 19, 2024
aa26f71
Merge branch 'develop' into task/chen59/valopinterface
rchen20 Sep 19, 2024
ab40dc5
Comments on how Reducer object works with ValOp.
rchen20 Sep 20, 2024
3cca66f
Documentation on ValOp.
rchen20 Sep 20, 2024
59482fa
Fix GPU compiler warnings for decorated constructors.
rchen20 Sep 20, 2024
9adc6f7
Documentation fix.
rchen20 Sep 20, 2024
09ab606
Minor updates for ValLoc.
rchen20 Sep 27, 2024
2be4989
Simplify Reducer internals.
rchen20 Sep 27, 2024
6616abe
Remove old grid_reduce for expt Reducer.
rchen20 Sep 27, 2024
94b4fbf
Update backends with new Reducer internals.
rchen20 Sep 27, 2024
d0b370f
Split loc tests to hopefully reduce CI build times.
rchen20 Sep 27, 2024
29b9332
User documentation.
rchen20 Sep 27, 2024
4b7034d
Formatting in example.
rchen20 Sep 27, 2024
01fab6a
Reword.
rchen20 Sep 27, 2024
8bc0c3e
Merge branch 'develop' into task/chen59/valopinterface
rchen20 Sep 27, 2024
540dbb2
Merge branch 'develop' of github.com:LLNL/RAJA into task/chen59/valop…
rchen20 Sep 30, 2024
0a17938
Various template simplifications.
rchen20 Oct 2, 2024
28a96dc
Documentation updates.
rchen20 Oct 3, 2024
21c4414
Update error message for ordering of ValOp and Reduce args.
rchen20 Oct 3, 2024
c68db6c
More template simplifications.
rchen20 Oct 4, 2024
8a61199
Change member variable naming convention.
rchen20 Oct 4, 2024
cd64e23
Simplify ValLoc Reducer case. Perform final combine in Reducer.
rchen20 Oct 5, 2024
e48637c
Reword.
rchen20 Oct 7, 2024
bc33040
Change VType to VOp.
rchen20 Oct 7, 2024
b5d5845
Initialize ValOp with identity for the ValLoc case.
rchen20 Oct 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
128 changes: 95 additions & 33 deletions docs/sphinx/user_guide/feature/reduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,9 @@ RAJA::expt::Reduce
..................
::

using VALOP_DOUBLE_SUM = RAJA::expt::ValOp<double, RAJA::operators::plus>;
using VALOP_DOUBLE_MIN = RAJA::expt::ValOp<double, RAJA::operators::minimum>;

double* a = ...;

double rs = 0.0;
Expand All @@ -198,9 +201,9 @@ RAJA::expt::Reduce
RAJA::forall<EXEC_POL> ( Res, Seg,
RAJA::expt::Reduce<RAJA::operators::plus>(&rs),
RAJA::expt::Reduce<RAJA::operators::minimum>(&rm),
[=] (int i, double& _rs, double& _rm) {
[=] (int i, VALOP_DOUBLE_SUM& _rs, VALOP_DOUBLE_MIN& _rm) {
_rs += a[i];
_rm = RAJA_MIN(a[i], _rm);
_rm.min(a[i]);
}
);

Expand All @@ -213,13 +216,14 @@ RAJA::expt::Reduce
above. The reduction operation will include the existing value of
the given target variable.
* The kernel body lambda expression passed to ``RAJA::forall`` must have a
parameter corresponding to each ``RAJA::expt::Reduce`` argument, ``_rs`` and
``_rm`` in the example code. These parameters refer to a local target for each
reduction operation. It is important to note that the parameters follow the
kernel iteration variable, ``i`` in this case, and appear in the same order
as the corresponding ``RAJA::expt::Reduce`` arguments to ``RAJA::forall``. The
parameter types must be references to the types used in the
``RAJA::expt::Reduce`` arguments.
``RAJA::expt::ValOp`` parameter corresponding to each ``RAJA::expt::Reduce``
argument, ``_rs`` and ``_rm`` in the example code. These parameters refer to a
local target for each reduction operation. Each ``ValOp`` needs to be templated
on the underlying data type (``double`` for ``_rs`` and ``_rm``), and the operator
being used. It is important to note that the parameters follow the kernel iteration
variable, ``i`` in this case, and appear in the same order as the corresponding
``RAJA::expt::Reduce`` arguments to ``RAJA::forall``. The parameter types must be
references to the types used in the ``RAJA::expt::Reduce`` arguments.
rchen20 marked this conversation as resolved.
Show resolved Hide resolved
* The local variables referred to by ``_rs`` and ``_rm`` are initialized with
the *identity* of the reduction operation to be performed.
* The local variables are updated in the user supplied lambda.
Expand All @@ -236,38 +240,85 @@ RAJA::expt::Reduce
compatible with the ``EXEC_POL``. ``Seg`` is the iteration space
object for ``RAJA::forall``.

.. important:: The order and types of the local reduction variables in the
kernel body lambda expression must match exactly with the
corresponding ``RAJA::expt::Reduce`` arguments to the
``RAJA::forall`` to ensure that the correct result is obtained.
.. important:: The local reduction arguments to the lambda expression must be
rchen20 marked this conversation as resolved.
Show resolved Hide resolved
``RAJA::expt::ValOp`` references. Each ``ValOp`` references
corresponds to a ``RAJA::expt::Reduce`` call within the forall
arguments. The ``ValOp`` reduction data type and RAJA operator need
to match the data type referenced and operator template argument
in the ``RAJA::expt::Reduce`` call. Finally, the ordering of the
``ValOp`` references must correspond to the ordering of the
``RAJA::expt::Reduce`` calls to ensure that the correct result is
obtained.

RAJA::expt::ValLoc
..................

As with the current RAJA reduction interface, the new interface supports *loc*
reductions, which provide the ability to get a kernel/loop index at which the
final reduction value was found. With this new interface, *loc* reductions
are performed using ``ValLoc<T>`` types. Since they are strongly typed, they
provide ``min()`` and ``max()`` operations that are equivalent to using
``RAJA_MIN()`` or ``RAJA_MAX`` macros as demonstrated in the code example below.
Users must use the ``getVal()`` and ``getLoc()`` methods to access the reduction
results::
are performed using ``ValLoc<T,I>`` types, where ``T`` is the underlying data type,
and ``I`` is the index type. Users must use the ``getVal()`` and ``getLoc()``
methods to access the reduction results.
rchen20 marked this conversation as resolved.
Show resolved Hide resolved

In the kernel body lambda expression, a ``ValLoc<T,I>`` must be wrapped in a
rchen20 marked this conversation as resolved.
Show resolved Hide resolved
``ValOp``, and passed to the lambda in the same order as the corresponding
rchen20 marked this conversation as resolved.
Show resolved Hide resolved
``RAJA::expt::Reduce`` arguments, e.g. ``ValOp<ValLoc<T,I>, Op>``.
For convenience, an alias of ``RAJA::expt::ValLocOp<T,I,Op>`` is provided.
Within the lambda, this ``ValLocOp`` object provides ``minloc``, and ``maxloc``
functions::

double* a = ...;

using VALOPLOC_DOUBLE_MIN = RAJA::expt::ValOp<ValLoc<double, RAJA::Index_type>,
rchen20 marked this conversation as resolved.
Show resolved Hide resolved
RAJA::operators::minimum>;
using VALOPLOC_DOUBLE_MAX = RAJA::expt::ValLocOp<double, RAJA::Index_type,
RAJA::operators::minimum>;

using VL_DOUBLE = RAJA::expt::ValLoc<double>;
VL_DOUBLE rm_loc;
VL_DOUBLE rmin_loc;
VL_DOUBLE rmax_loc;

RAJA::forall<EXEC_POL> ( Res, Seg,
RAJA::expt::Reduce<RAJA::operators::minimum>(&rm_loc),
[=] (int i, VL_DOUBLE& _rm_loc) {
_rm_loc = RAJA_MIN(VL_DOUBLE(a[i], i), _rm_loc);
//_rm_loc.min(VL_DOUBLE(a[i], i)); // Alternative to RAJA_MIN
RAJA::expt::Reduce<RAJA::operators::minimum>(&rmin_loc),
RAJA::expt::Reduce<RAJA::operators::maximum>(&rmax_loc),
[=] (int i, VALOPLOC_DOUBLE_MIN& _rmin_loc, VALOPLOC_DOUBLE_MAX& _rmax_loc) {
_rmin_loc.minloc(a[i], i);
_rmax_loc.minloc(a[i], i);
}
);

std::cout << rm_loc.getVal() ...
std::cout << rm_loc.getLoc() ...
std::cout << rmin_loc.getVal() ...
std::cout << rmin_loc.getLoc() ...
std::cout << rmax_loc.getVal() ...
std::cout << rmax_loc.getLoc() ...

Alternatively, *loc* reductions can be performed on separate reduction data, and
rchen20 marked this conversation as resolved.
Show resolved Hide resolved
location variables without a ``ValLoc`` object. To use this capability, a
``RAJA::expt::ReduceLoc`` call must be passed to the ``RAJA::forall``, templated on
rchen20 marked this conversation as resolved.
Show resolved Hide resolved
the reduction operation, and passing in references to the data and location as
``ReduceLoc`` function arguments. The data and location can be accessed outside of
the forall directly without ``getVal()`` or ``getLoc()`` functions.
::

double* a = ...;

using VALOPLOC_DOUBLE_MIN = RAJA::expt::ValLocOp<double, RAJA::Index_type,
RAJA::operators::minimum>;

// No ValLoc needed from the user here.
double rm;
RAJA::Index_type loc;

RAJA::forall<EXEC_POL> ( Res, Seg,
RAJA::expt::ReduceLoc<RAJA::operators::minimum>(&rm, &loc),
[=] (int i, VALOPLOC_DOUBLE_MIN& _rm_loc) {
_rm_loc.minloc(a[i], i);
}
);

std::cout << rm ...
std::cout << loc ...


Lambda Arguments
................
Expand All @@ -277,6 +328,10 @@ any number of ``RAJA::expt::Reduce`` objects to the ``RAJA::forall`` method::

double* a = ...;

using VALOP_DOUBLE_SUM = RAJA::expt::ValOp<double, RAJA::operators::plus>;
using VALOP_DOUBLE_MIN = RAJA::expt::ValOp<double, RAJA::operators::minimum>;
using VALOPLOC_DOUBLE_MIN = RAJA::expt::ValLocOp<double, RAJA::Index_type, RAJA::operators::minimum>;

using VL_DOUBLE = RAJA::expt::ValLoc<double>;
VL_DOUBLE rm_loc;
double rs;
Expand All @@ -287,10 +342,13 @@ any number of ``RAJA::expt::Reduce`` objects to the ``RAJA::forall`` method::
RAJA::expt::Reduce<RAJA::operators::minimum>(&rm), // --> 1 double added
RAJA::expt::Reduce<RAJA::operators::minimum>(&rm_loc), // --> 1 VL_DOUBLE added
RAJA::expt::KernelName("MyFirstRAJAKernel"), // --> NO args added
[=] (int i, double& _rs, double& _rm, VL_DOUBLE& _rm_loc) {
[=] (int i,
VALOP_DOUBLE_SUM& _rs,
VALOP_DOUBLE_MIN& _rm,
VALOPLOC_DOUBLE_MIN& _rm_loc) {
_rs += a[i];
_rm = RAJA_MIN(a[i], _rm);
_rm_loc.min(VL_DOUBLE(a[i], i));
_rm.min(a[i]);
_rm_loc.minloc(a[i], i);
}
);

Expand All @@ -300,9 +358,10 @@ any number of ``RAJA::expt::Reduce`` objects to the ``RAJA::forall`` method::
std::cout << rm_loc.getLoc() ...

Again, the lambda expression parameters are in the same order as
the ``RAJA::expt::Reduce`` arguments to ``RAJA::forall``. Both the types and
order of the parameters must match to get correct results and to compile
successfully. Otherwise, a static assertion will be triggered::
the ``RAJA::expt::Reduce`` arguments to ``RAJA::forall``. The ``ValOp`` underlying
data types and operators, and order of the ``ValOp`` parameters must match
the corresponding ``RAJA::expt::Reduce`` types to get correct results and to
compile successfully. Otherwise, a static assertion will be triggered::
rchen20 marked this conversation as resolved.
Show resolved Hide resolved

LAMBDA Not invocable w/ EXPECTED_ARGS.
rchen20 marked this conversation as resolved.
Show resolved Hide resolved

Expand All @@ -329,19 +388,22 @@ The usage of the experiemental reductions is similar to the forall example as il

double* a = ...;

using VALOP_DOUBLE_SUM = RAJA::expt::ValOp<double, RAJA::operators::plus>;
using VALOP_DOUBLE_MIN = RAJA::expt::ValOp<double, RAJA::operators::minimum>;

double rs = 0.0;
double rm = 1e100;

RAJA::launch<EXEC_POL> ( Res,
RAJA::expt::Reduce<RAJA::operators::plus>(&rs),
RAJA::expt::Reduce<RAJA::operators::minimum>(&rm),
"LaunchReductionKernel",
[=] RAJA_HOST_DEVICE (int i, double& _rs, double& _rm) {
[=] RAJA_HOST_DEVICE (int i, VALOP_DOUBLE_SUM& _rs, VALOP_DOUBLE_MIN& _rm) {

RAJA::loop<loop_pol>(ctx, Seg, [&] (int i) {

_rs += a[i];
_rm = RAJA_MIN(a[i], _rm);
_rm.min(a[i], _rm);

}
);
Expand Down
Loading
Loading