IntelĀ® MPI Benchmarks 4.0
Semantics of nonblocking collective operations enables you to run inter-process communication in the background while performing computations. However, the actual overlap depends on the particular MPI library implementation. You can measure a potential overlap of communication and computation using IMB-NBC benchmarks. The general benchmark flow is as follows:
The timing values to interpret the overlap potential are as follows:
t_pure is the time of a pure communication operation, non-overlapping with CPU activity.
t_CPU is the time the IMB_cpu_exploit function takes to complete when run concurrently with the nonblocking communication operation.
t_ovrl is the time of the nonblocking communication operation takes to complete when run concurrently with a CPU activity.
If t_ovrl = max(t_pure,t_CPU), the processes are running with a perfect overlap.
If t_ovrl = t_pure+t_CPU, the processes are running with no overlap.
Since different processes in a collective operation may have different execution times, the timing values are taken for the process with the biggest t_ovrl execution time. The IMB-NBC result tables report the timings t_ovrl, t_pure, t_CPU and the estimated overlap in percent calculated by the following formula:
overlap = 100.*max(0,min(1, (t_pure+t_CPU-t_ovrl) / min(t_pure, t_CPU))
See Also