Skip to content
Ruipeng Li edited this page May 25, 2021 · 12 revisions

Basic information on how to compile and run on GPUs can be found in the Users Manual. This document is intended to provide additional details.

Memory locations and execution policies

Hypre provides two user-level memory locations, HYPRE_MEMORY_HOST and HYPRE_MEMORY_DEVICE, where HYPRE_MEMORY_HOST is always the CPU memory while HYPRE_MEMORY_DEVICE can be mapped to different memory spaces based on the configure options of hypre. When built with --with-cuda or --with-device-openmp, HYPRE_MEMORY_DEVICE is the GPU device memory, and when built with additionally --enable-unified-memory, it is the GPU unified memory (UM). For a non-GPU build, HYPRE_MEMORY_DEVICE is also mapped to the CPU memory. The default memory location of hypre's matrix and vector objects is HYPRE_MEMORY_DEVICE, which can be changed by HYPRE_SetMemoryLocation(...). Note that this runtime switch of memory locations is currently only effective for the unstructured interface, and has not been extensively tested.

The execution policies define the platform of running computations based on the memory locations of participating objects. The default policy is HYPRE_EXEC_HOST, i.e., executing on the host if the objects are accessible from the host. It can be adjusted by HYPRE_SetExecutionPolicy(...). Clearly, this policy only has effect to objects on UM, since UM is accessible from both CPUs and GPUs. Note that this runtime switch of execution policies is currently only effective for IJ-matrix setup and BoomerAMG setup phase.

Current best practices configuration settings for SMG/PFMG on GPUs

No special changes to the solvers' interfaces need to be made other than to give GPU memory addresses for all input pointers.

Current best practices configuration settings for BoomerAMG on GPUs

Current AMG setup and solve parameters that have GPU support are listed as follows:

  • AMG setup

    • Coarsening algorithm: PMIS (8) and aggressive coarsening
    • Interpolation algorithms: direct (3), BAMG-direct (15), extended+i (6), extended (14) and extended+e (18). Second-stage interpolation with aggressive coarsening: extended (5) and extended+e (7)
    • RAP: 2-multiplication R(AP), 1-multiplication RAP
  • AMG solve

    • Smoothers: Jacobi (7), l1-Jacobi (18), two-stage Gauss-Seidel (11, 12). Relaxation order must be 0, i.e., lexicographic order
    • Matrix-by-vector: save local transposes of P to explicitly multiply with P^{T}

A sample code of setting up IJ matrix A and solve Ax=b using AMG-preconditioned CG is shown below.

 HYPRE_Init(); /* must be the first HYPRE function call */
 ...
 /* AMG in GPU memory (default) */
 HYPRE_SetMemoryLocation(HYPRE_MEMORY_DEVICE);
 /* setup AMG on GPUs */
 HYPRE_SetExecutionPolicy(HYPRE_EXEC_DEVICE);
 /* use hypre's SpGEMM instead of cuSPARSE */
 HYPRE_SetSpGemmUseCusparse(FALSE);
 /* use GPU RNG */
 HYPRE_SetUseGpuRand(TRUE);
 if (useHypreGpuMemPool)
 {
    /* use hypre's GPU memory pool */
    HYPRE_SetGPUMemoryPoolSize(bin_growth, min_bin, max_bin, max_bytes);
 }
 else if (useUmpireGpuMemPool)
 {
    /* or use Umpire GPU memory pool */
    HYPRE_SetUmpireUMPoolName("HYPRE_UM_POOL_TEST");
    HYPRE_SetUmpireDevicePoolName("HYPRE_DEVICE_POOL_TEST");
 }
 ...
 /* setup IJ matrix A */
 HYPRE_IJMatrixCreate(comm, first_row, last_row, first_col, last_col, &ij_A);
 HYPRE_IJMatrixSetObjectType(ij_A, HYPRE_PARCSR);
 /* GPU pointers; efficient in large chunks */
 HYPRE_IJMatrixAddToValues(ij_A, num_rows, num_cols, rows, cols, data);
 HYPRE_IJMatrixAssemble(ij_A);
 HYPRE_IJMatrixGetObject(ij_A, (void **) &parcsr_A);
 ...
 /* setup AMG */
 HYPRE_ParCSRPCGCreate(comm, &solver);
 HYPRE_BoomerAMGCreate(&precon);
 HYPRE_BoomerAMGSetRelaxType(precon, rlx_type); /* 7, 18, 11, 12, (3, 4, 6) */
 HYPRE_BoomerAMGSetRelaxOrder(precon, FALSE); /* must be false */
 HYPRE_BoomerAMGSetCoarsenType(precon, coarsen_type); /* 8 */
 HYPRE_BoomerAMGSetInterpType(precon, interp_type); /* 3, 15, 6, 14, 18 */
 HYPRE_BoomerAMGSetAggNumLevels(precon, agg_num_levels);
 HYPRE_BoomerAMGSetAggInterpType(precon, agg_interp_type); /* 5 or 7 */
 HYPRE_BoomerAMGSetKeepTranspose(precon, TRUE); /* keep transpose to avoid SpMTV */
 HYPRE_BoomerAMGSetRAP2(precon, 0); /* RAP in two multiplications (default: 0) */
 HYPRE_ParCSRPCGSetPrecond(solver, HYPRE_BoomerAMGSolve, HYPRE_BoomerAMGSetup, precon);
 HYPRE_PCGSetup(solver, parcsr_A, b, x);
 ...
 /* solve */
 HYPRE_PCGSolve(solver, parcsr_A, b, x);
 ...
 HYPRE_Finalize(); /* must be the last HYPRE function call */

Build hypre with Umpire

Add the following configure options. The default is to use Umpire pooling allocator for GPU device and unified memory.

--with-umpire --with-umpire-include=/path-of-umpire-install/include 
--with-umpire-lib-dirs=/path-of-umpire-install/lib 
--with-umpire-libs=umpire
Clone this wiki locally