Comparing performance of sequential vs OpenMP-based element-by-element vector multiplication.
In each of the experiments given below, we multiply two floating-point vectors
x
and y
, with number of elements from 10^6
to 10^9
using OpenMP.
Each element count is attempted with various approaches, running each approach 5
times to get a good time measure. Multiplication here represents any
memory-aligned independent operation, or a map()
operation.
In this experiment (adjust-schedule), we multiply two floating-point vectors
x
and y
using OpenMP. Each element count is attempted with various OpenMP
schedule configs. Results indicate a schedule-kind of auto
to be
suitable.
In this experiment (compare-sequential, main), we compare the performance
between finding x*y
using a single thread (sequential), and using
OpenMP. Here x
, y
are both floating-point vectors, and the comparison in
performed on a number of vector sizes. Note that neither approach makes use of
SIMD instructions which are available on all modern hardware. While it might
seem that OpenMP method would be a clear winner, the results indicate it is
not the case. This is possibly because of high communication costs, and not
enough computational workload as indicated by this answer. However, from 10⁸
elements, OpenMP approach performs better than sequential. All outputs are
saved in gist. Some charts are also included below, generated from sheets.
- open MP - dot product
- What's the difference between “static” and “dynamic” schedule in OpenMP?
- Git pulling a branch from another repository?