IntelĀ® MPI Benchmarks 4.0
The following example shows the results of the PingPing
<..> -np 6 IMB-MPI1 pingping allreduce -map 2x3 -msglen Lengths -multi 0 Lengths file: 0 100 1000 10000 100000 1000000 #--------------------------------------------------- # Intel (R) MPI Benchmark Suite V3.2.2, MPI1 part #--------------------------------------------------- # Date : Thu Sep 4 13:26:03 2008 # Machine : x86_64 # System : Linux # Release : 2.6.9-42.ELsmp # Version : #1 SMP Wed Jul 12 23:32:02 EDT 2006 # MPI Version : 2.0 # MPI Thread Environment: MPI_THREAD_SINGLE
# New default behavior from Version 3.2 on: # the number of iterations per message size is cut down # dynamically when a certain run time (per message size sample) # is expected to be exceeded. Time limit is defined by variable # SECS_PER_SAMPLE (=> IMB_settings.h) # or through the flag => -time
# Calling sequence was: # IMB-MPI1 pingping allreduce -map 3x2 -msglen Lengths # -multi 0
# Message lengths were user-defined # # MPI_Datatype : MPI_BYTE # MPI_Datatype for reductions : MPI_FLOAT # MPI_Op : MPI_SUM # # # List of Benchmarks to run: # (Multi-)PingPing # (Multi-)Allreduce #-------------------------------------------------------------- # Benchmarking Multi-PingPing # ( 3 groups of 2 processes each running simultaneously ) # Group 0: 0 3 # # Group 1: 1 4 # # Group 2: 2 5 # #-------------------------------------------------------------- # bytes #rep.s t_min[μsec] t_max[μsec] t_avg[μsec] Mbytes/sec 0 1000 .. .. .. .. 100 1000 1000 1000 10000 1000 100000 419 1000000 41 #-------------------------------------------------------------- # Benchmarking Multi-Allreduce # ( 3 groups of 2 processes each running simultaneously ) # Group 0: 0 3 # # Group 1: 1 4 # # Group 2: 2 5 # #-------------------------------------------------------------- #bytes #repetitions t_min[μsec] t_max[μsec] t_avg[μsec] 0 1000 .. .. .. 100 1000 1000 1000 10000 1000 100000 419 1000000 41 #-------------------------------------------------------------- # Benchmarking Allreduce # #processes = 4; rank order (rowwise): # 0 3 # # 1 4 # # ( 2 additional processes waiting in MPI_Barrier) #-------------------------------------------------------------- # bytes #repetitions t_min[μsec] t_max[μsec] t_avg[μsec] 0 1000 .. .. .. 100 1000 1000 1000 10000 1000 100000 419 1000000 41 #-------------------------------------------------------------- # Benchmarking Allreduce # # processes = 6; rank order (rowwise): # 0 3 # # 1 4 # # 2 5 # #-------------------------------------------------------------- # bytes #repetitions t_min[μsec] t_max[μsec] t_avg[μsec] 0 1000 .. .. .. 100 1000 1000 1000 10000 1000 100000 419 1000000 41 # All processes entering MPI_Finalize