Skip to content

Commit

Permalink
Merge branch 'develop' into woptim/spack-update
Browse files Browse the repository at this point in the history
  • Loading branch information
adrienbernede authored Oct 8, 2024
2 parents 6f9d251 + 5443545 commit 346f070
Show file tree
Hide file tree
Showing 29 changed files with 1,265 additions and 352 deletions.
143 changes: 108 additions & 35 deletions docs/sphinx/user_guide/feature/reduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,9 @@ RAJA::expt::Reduce
..................
::

using VALOP_DOUBLE_SUM = RAJA::expt::ValOp<double, RAJA::operators::plus>;
using VALOP_DOUBLE_MIN = RAJA::expt::ValOp<double, RAJA::operators::minimum>;

double* a = ...;

double rs = 0.0;
Expand All @@ -198,9 +201,9 @@ RAJA::expt::Reduce
RAJA::forall<EXEC_POL> ( Res, Seg,
RAJA::expt::Reduce<RAJA::operators::plus>(&rs),
RAJA::expt::Reduce<RAJA::operators::minimum>(&rm),
[=] (int i, double& _rs, double& _rm) {
[=] (int i, VALOP_DOUBLE_SUM& _rs, VALOP_DOUBLE_MIN& _rm) {
_rs += a[i];
_rm = RAJA_MIN(a[i], _rm);
_rm.min(a[i]);
}
);

Expand All @@ -213,13 +216,14 @@ RAJA::expt::Reduce
above. The reduction operation will include the existing value of
the given target variable.
* The kernel body lambda expression passed to ``RAJA::forall`` must have a
parameter corresponding to each ``RAJA::expt::Reduce`` argument, ``_rs`` and
``_rm`` in the example code. These parameters refer to a local target for each
reduction operation. It is important to note that the parameters follow the
kernel iteration variable, ``i`` in this case, and appear in the same order
as the corresponding ``RAJA::expt::Reduce`` arguments to ``RAJA::forall``. The
parameter types must be references to the types used in the
``RAJA::expt::Reduce`` arguments.
``RAJA::expt::ValOp`` parameter corresponding to each ``RAJA::expt::Reduce``
argument, ``_rs`` and ``_rm`` in the example code. These parameters refer to a
local target for each reduction operation. Each ``ValOp`` needs to be templated
on the underlying data type (``double`` for ``_rs`` and ``_rm``), and the operator
being used. It is important to note that the parameters follow the kernel iteration
variable, ``i`` in this case, and appear in the same order as the corresponding
``RAJA::expt::Reduce`` arguments to ``RAJA::forall``. The ``ValOp`` parameters must
be references to the objects instantiated by the ``RAJA::expt::Reduce`` arguments.
* The local variables referred to by ``_rs`` and ``_rm`` are initialized with
the *identity* of the reduction operation to be performed.
* The local variables are updated in the user supplied lambda.
Expand All @@ -236,47 +240,109 @@ RAJA::expt::Reduce
compatible with the ``EXEC_POL``. ``Seg`` is the iteration space
object for ``RAJA::forall``.

.. important:: The order and types of the local reduction variables in the
kernel body lambda expression must match exactly with the
corresponding ``RAJA::expt::Reduce`` arguments to the
``RAJA::forall`` to ensure that the correct result is obtained.
.. important:: * ``RAJA::expt::Reduce`` arguments must be passed to the forall.
These arguments are templated on the reduction operator, and take
a pointer to the target reduction variable that was declared outside
of the forall.
* The local reduction arguments to the lambda expression must be
``RAJA::expt::ValOp`` references. Each ``ValOp`` reference
corresponds to a ``RAJA::expt::Reduce`` argument within the forall.
* The ordering of the ``ValOp`` references must correspond to the
ordering of the ``RAJA::expt::Reduce`` arguments to ensure that the
correct result is obtained.
* Each ``ValOp`` reduction data type and RAJA operator need to match
the data type referenced, and operator template argument in the
corresponding ``RAJA::expt::Reduce`` argument.

RAJA::expt::ValLoc
..................

As with the current RAJA reduction interface, the new interface supports *loc*
reductions, which provide the ability to get a kernel/loop index at which the
final reduction value was found. With this new interface, *loc* reductions
are performed using ``ValLoc<T>`` types. Since they are strongly typed, they
provide ``min()`` and ``max()`` operations that are equivalent to using
``RAJA_MIN()`` or ``RAJA_MAX`` macros as demonstrated in the code example below.
Users must use the ``getVal()`` and ``getLoc()`` methods to access the reduction
results::
are performed using ``ValLoc<T,I>`` types, where ``T`` is the underlying data type,
and ``I`` is the index type. Users must use the ``getVal()`` and ``getLoc()``
methods to access the reduction results after the kernel completes.

In the lambda expression, a ``ValLoc<T,I>`` must be wrapped in a
``ValOp`` type, and passed to the lambda in the same order as the corresponding
``RAJA::expt::Reduce`` arguments, e.g. ``ValOp<ValLoc<T,I>, Op>``. In the example
below, ``VALOPLOC_DOUBLE_MIN`` represents a wrapped ``ValLoc`` usable within the
lambda.

For convenience, an alias of ``RAJA::expt::ValLocOp<T,I,Op>`` is provided.
Within the lambda, this ``ValLocOp`` object provides ``minloc``, and ``maxloc``
functions. In the example below, ``VALOPLOC_DOUBLE_MAX`` represents a wrapped
``ValLoc`` using the ``ValLocOp`` alias::

double* a = ...;

using VALOPLOC_DOUBLE_MIN = RAJA::expt::ValOp<ValLoc<double, RAJA::Index_type>,
RAJA::operators::minimum>;
using VALOPLOC_DOUBLE_MAX = RAJA::expt::ValLocOp<double, RAJA::Index_type,
RAJA::operators::minimum>;

using VL_DOUBLE = RAJA::expt::ValLoc<double>;
VL_DOUBLE rm_loc;
VL_DOUBLE rmin_loc;
VL_DOUBLE rmax_loc;

RAJA::forall<EXEC_POL> ( Res, Seg,
RAJA::expt::Reduce<RAJA::operators::minimum>(&rm_loc),
[=] (int i, VL_DOUBLE& _rm_loc) {
_rm_loc = RAJA_MIN(VL_DOUBLE(a[i], i), _rm_loc);
//_rm_loc.min(VL_DOUBLE(a[i], i)); // Alternative to RAJA_MIN
RAJA::expt::Reduce<RAJA::operators::minimum>(&rmin_loc),
RAJA::expt::Reduce<RAJA::operators::maximum>(&rmax_loc),
[=] (int i, VALOPLOC_DOUBLE_MIN& _rmin_loc, VALOPLOC_DOUBLE_MAX& _rmax_loc) {
_rmin_loc.minloc(a[i], i);
_rmax_loc.minloc(a[i], i);
}
);

std::cout << rm_loc.getVal() ...
std::cout << rm_loc.getLoc() ...
std::cout << rmin_loc.getVal() ...
std::cout << rmin_loc.getLoc() ...
std::cout << rmax_loc.getVal() ...
std::cout << rmax_loc.getLoc() ...

Alternatively, *loc* reductions can be performed on separate reduction data, and
location variables without a ``ValLoc`` object, seen in the next example below.
To use this capability, a ``RAJA::expt::ReduceLoc`` argument must be passed to the
``RAJA::forall``, templated on the reduction operation, and passing in references to
the data and location. This is illustrated in the example below, with references to
``rm`` and ``loc`` being passed into the ``ReduceLoc`` argument in the forall. The
data and location can be accessed outside of the forall directly without
``getVal()`` or ``getLoc()`` functions.
::

double* a = ...;

using VALOPLOC_DOUBLE_MIN = RAJA::expt::ValLocOp<double, RAJA::Index_type,
RAJA::operators::minimum>;

// No ValLoc needed from the user here.
double rm;
RAJA::Index_type loc;

RAJA::forall<EXEC_POL> ( Res, Seg,
RAJA::expt::ReduceLoc<RAJA::operators::minimum>(&rm, &loc), // --> 1 double & 1 index added
[=] (int i, VALOPLOC_DOUBLE_MIN& _rm_loc) {
_rm_loc.minloc(a[i], i);
}
);

// No getVal() or getLoc() required. Access results in their original form.
std::cout << rm ...
std::cout << loc ...


Lambda Arguments
................

This interface takes advantage of C++ parameter packs to allow users to pass
any number of ``RAJA::expt::Reduce`` objects to the ``RAJA::forall`` method::
any number of ``RAJA::expt::Reduce`` arguments to the ``RAJA::forall`` method::

double* a = ...;

using VALOP_DOUBLE_SUM = RAJA::expt::ValOp<double, RAJA::operators::plus>;
using VALOP_DOUBLE_MIN = RAJA::expt::ValOp<double, RAJA::operators::minimum>;
using VALOPLOC_DOUBLE_MIN = RAJA::expt::ValLocOp<double, RAJA::Index_type, RAJA::operators::minimum>;

using VL_DOUBLE = RAJA::expt::ValLoc<double>;
VL_DOUBLE rm_loc;
double rs;
Expand All @@ -287,10 +353,13 @@ any number of ``RAJA::expt::Reduce`` objects to the ``RAJA::forall`` method::
RAJA::expt::Reduce<RAJA::operators::minimum>(&rm), // --> 1 double added
RAJA::expt::Reduce<RAJA::operators::minimum>(&rm_loc), // --> 1 VL_DOUBLE added
RAJA::expt::KernelName("MyFirstRAJAKernel"), // --> NO args added
[=] (int i, double& _rs, double& _rm, VL_DOUBLE& _rm_loc) {
[=] (int i,
VALOP_DOUBLE_SUM& _rs,
VALOP_DOUBLE_MIN& _rm,
VALOPLOC_DOUBLE_MIN& _rm_loc) {
_rs += a[i];
_rm = RAJA_MIN(a[i], _rm);
_rm_loc.min(VL_DOUBLE(a[i], i));
_rm.min(a[i]);
_rm_loc.minloc(a[i], i);
}
);

Expand All @@ -300,11 +369,12 @@ any number of ``RAJA::expt::Reduce`` objects to the ``RAJA::forall`` method::
std::cout << rm_loc.getLoc() ...

Again, the lambda expression parameters are in the same order as
the ``RAJA::expt::Reduce`` arguments to ``RAJA::forall``. Both the types and
order of the parameters must match to get correct results and to compile
successfully. Otherwise, a static assertion will be triggered::
the ``RAJA::expt::Reduce`` arguments to ``RAJA::forall``. The ``ValOp`` underlying
data types and operators, and order of the ``ValOp`` parameters must match
the corresponding ``RAJA::expt::Reduce`` types to get correct results and to
compile successfully. Otherwise, a static assertion will be triggered::

LAMBDA Not invocable w/ EXPECTED_ARGS.
LAMBDA Not invocable w/ EXPECTED_ARGS. Ordering and types must match between RAJA::expt::Reduce() and ValOp arguments.

.. note:: This static assert is only enabled when passing an undecorated C++
lambda. Meaning, this check will not happen when passing
Expand All @@ -329,19 +399,22 @@ The usage of the experiemental reductions is similar to the forall example as il

double* a = ...;

using VALOP_DOUBLE_SUM = RAJA::expt::ValOp<double, RAJA::operators::plus>;
using VALOP_DOUBLE_MIN = RAJA::expt::ValOp<double, RAJA::operators::minimum>;

double rs = 0.0;
double rm = 1e100;

RAJA::launch<EXEC_POL> ( Res,
RAJA::expt::Reduce<RAJA::operators::plus>(&rs),
RAJA::expt::Reduce<RAJA::operators::minimum>(&rm),
"LaunchReductionKernel",
[=] RAJA_HOST_DEVICE (int i, double& _rs, double& _rm) {
[=] RAJA_HOST_DEVICE (int i, VALOP_DOUBLE_SUM& _rs, VALOP_DOUBLE_MIN& _rm) {

RAJA::loop<loop_pol>(ctx, Seg, [&] (int i) {

_rs += a[i];
_rm = RAJA_MIN(a[i], _rm);
_rm.min(a[i], _rm);

}
);
Expand Down
Loading

0 comments on commit 346f070

Please sign in to comment.