This is a library to intercept calls to CPU BLAS kernels and run their equivalent on the GPU in a CUDA environment.
$ export CUDA=...
$ meson -DCUDA=$CUDA build && ninja -C build
TODO: expand this section
- done once
- track ALL object allocations
- remove object-specific information, giving us only calls to malloc()
- saved to file
- in code (blas2cuda):
- define custom object manager (
struct objmngr
) - any time an allocation file is loaded, we use this memory manager
- define custom object manager (
- any time
malloc()
is called:- object tracker compares call info (from requested size and using libunwind to get instruction pointer) with allocation list
- if call info matches, allocate the object using the custom memory manager defined for the call, and track the object
- if call info doesn't match, act normally
- Each time a kernel is called, we would have to copy data to GPU, invoke the
kernel, and copy it back to the CPU. For a series of calls to kernels that
aren't computation-intensive (Level 1 and Level 2 BLAS calls are vector-vector
and matrix-vector operations), throughput is significantly degraded as the
time to transfer data dominates computation.
- This is why NVBLAS, a similar project, only intercepts computation-intensive Level 3 matrix-matrix operations, where the computation dominates data transfer.
- However, there's still this issue of copying back and forth.
- blas2cuda uses object tracking to distinguish memory objects that are used in BLAS kernels from other memory objects we don't care about.
- When a call is made to
malloc()
that we should care about, we usecudaMallocManaged()
instead and return a memory address that is shared between the CPU and GPU. This memory is a managed object, and a later call tofree()
will usecudaFree()
instead. - By intercepting the right calls, we can tell when these memory objects are later used in kernels, and avoid copying.
- Instead of explicit copying, a page faulting mechanism is used to move data between the CPU and GPU.
./blas2cuda.sh <objtrackfile> <program>