Command-line Control

You can control all the aspects of the Intel® MPI Benchmarks through the command-line. The general command-line syntax is the following:

IMB-MPI1    [-h{elp}]
            [-npmin     <NPmin>]
            [-multi     <MultiMode>]
            [-off_cache <cache_size[,cache_line_size]>
            [-iter      <msgspersample[,overall_vol[,msgs_nonaggr[,iter_policy]]]>]
            [-iter_policy     <iter_policy>]
            [-time     <max_runtime per sample>]
            [-mem       <max. mem usage per process>]
            [-msglen    <Lengths_file>]
            [-map       <PxQ>]
            [-input     <filename>]
            [-include]  [benchmark1 [,benchmark2 [,...]]]
            [-exclude]  [benchmark1 [,benchmark2 [,...]]]
            [-msglog [<minlog>:]<maxlog>]
            [benchmark1 [,benchmark2 [,...]]]

The command line is repeated in the output. The options may appear in any order.

Examples:

Use the following command line to get out-of-cache data for PingPong:

mpirun -np 2  IMB-MPI1 pingpong -off_cache -1

Use the following command line to run a very large configuration: restrict iterations to 20, max. 1.5 seconds run time per message size, max. 2 GBytes for message buffers:

mpirun -np 512 IMB-MPI1 -npmin 512
       alltoallv -iter 20 -time 1.5 -mem 2

Other examples:

mpirun -np 8  IMB-IO
mpirun -np 10 IMB-MPI1 PingPing Reduce
mpirun -np 11 IMB-EXT  -npmin 5
mpirun -np 14 IMB-IO P_Read_shared -npmin 7
 
mpirun -np 3  IMB-EXT  -input IMB_SELECT_EXT
mpirun -np 14 IMB-MPI1 -multi 0 PingPong Barrier
                       -map 2x7
mpirun -np 16 IMB-MPI1 -msglog 2:7 -include PingPongSpecificsource
PingPingSpecificsource -exclude Alltoall Alltoallv
mpirun -np 4 IMB-MPI1 -msglog 16 PingPong PingPing PingPongSpecificsource PingPingSpecificsource
mpirun –np 16 IMB-NBC –include Ialltoall_pure Ibcast_pure
mpirun –np 8 IMB-RMA –multi 1 Put_local

Benchmark Selection Arguments

Benchmark selection arguments are a sequence of blank-separated strings. Each argument is the name of a benchmark in exact spelling, case insensitive.

For example, the string IMB-MPI1 PingPong Allreduce specifies that you want to run PingPong and Allreduce benchmarks only.

Default: no benchmark selection. All benchmarks of the selected component are run.

-npmin Option

Specifies the minimum number of processes P_min to run all selected benchmarks on. The P_min value after -npmin must be an integer.

Given P_min, the benchmarks run on the processes with the numbers selected as follows:

P_min, 2P_min, 4P_min, ..., largest 2xP_min <P, P

NOTE:

You may set P_min to 1. If you set P_min > P, Intel MPI Benchmarks interprets this value as P_min = P.

Default: no -npmin selection. Active processes are selected as described in the Running Intel® MPI Benchmarks section.

-multi outflag Option

Defines whether the benchmark runs in the multiple mode. The argument after -multi is a meta-symbol <outflag> that can take an integer value of 0 or 1. This flag controls the way of displaying results:

When the number of processes running the benchmark is more than half of the overall number MPI_COMM_WORLD, the multiple benchmark coincides with the non-multiple one, as not more than one process group can be created.

Default: no -multi selection. Intel® MPI Benchmarks run non-multiple benchmark flavors.

-off_cache cache_size[,cache_line_size] Option

Use the -off_cache flag to avoid cache re-usage. If you do not use this flag (default), the communications buffer is the same within all repetitions of one message size sample. In this case, Intel® MPI Benchmarks reuses the cache, so throughput results might be non-realistic.

The argument after off_cache can be a single number (cache_size), two comma-separated numbers (cache_size,cache_line_size), or -1:

The sent/received data is stored in buffers of size ~2x MAX(cache_size, message_size). When repetitively using messages of a particular size, their addresses are advanced within those buffers so that a single message is at least 2 cache lines after the end of the previous message. When these buffers are filled up, they are reused from the beginning.

-off_cache is effective for IMB-MPI1 and IMB-EXT. You are not recommended to use this option for IMB-IO.

Examples

Use the default values defined in IMB_mem_info.h:

-off_cache -1

2.5 MB last level cache, default line size:

-off_cache 2.5

16 MB last level cache, line size 128:

-off_cache 16,128

The off_cache mode might also be influenced by eventual internal caching with the Intel® MPI Library. This could make results interpretation complicated.

Default: no cache control. Data may come out of cache.

-iter Option

Use this option to control the number of iterations.

By default, the number of iterations is controlled through parameters MSGSPERSAMPLE, OVERALL_VOL, MSGS_NONAGGR, and ITER_POLICY defined in IMB_settings.h.

You can optionally add one or more arguments after the -iter flag, to override the default values defined in IMB_settings.h. Use the following guidelines for the optional arguments:

Examples

To define MSGSPERSAMPLE as 2000, and OVERALL_VOL as 100, use the following command line:

-iter 2000,100

To define MSGS_NONAGGR as 150, you need to define values for MSGSPERSAMPLE and OVERALL_VOL as shown in the following command line:

-iter 1000,40,150

To define MSGSPERSAMPLE as 2000 and set the multiple_np policy, use the following command line (see -iter_policy):

-iter 2000,multiple_np

-iter_policy Option

Use this option to set a policy for automatic calculation of the number of iterations. Use one of the following arguments to override the default ITER_POLICY value defined in IMB_settings.h:

Policy

Description

dynamic

Reduces the number of iterations when the maximum run time per sample (see -time) is expected to be reached. Using this policy ensures faster execution, but may lead to inaccuracy of the results

multiple_np

Reduces the number of iterations when the message size is growing. Using this policy ensures the accuracy of the results, but may lead to longer execution time. You can control the execution time through the -time option

auto

Automatically chooses which policy to use:

  • applies multiple_np to collective operations where one of the ranks acts as the root of the operation (for example, MPI_Bcast)
  • applies dynamic to all other types of operations

off

The number of iterations does not change during the execution

You can also set the policy through the -iter option. See -iter.

Default: ITER_POLICY value defined in IMB_settings.h. The default policy is dynamic.

-time Option

Specifies the number of seconds for the benchmark to run per message size. The argument after -time is a floating-point number.

The combination of this flag with the -iter flag or its default alternative ensures that the Intel® MPI Benchmarks always chooses the maximum number of repetitions that conform to all restrictions.

A rough number of repetitions per sample to fulfill the -time request is estimated in preparatory runs that use ~1 second overhead.

Default: -time is activated. The floating-point value specifying the run-time seconds per sample is set in the SECS_PER_SAMPLE variable defined in IMB_settings.h/IMB_settings_io.h. The current value is 10.

-mem Option

Specifies the number of GB to be allocated per process for the message buffers benchmarks/message. If the size is exceeded, a warning is returned, stating how much memory is required for the overall run not to be interrupted.

The argument after -mem is a floating-point number.

Default: the memory is restricted by MAX_MEM_USAGE defined in IMB_mem_info.h.

-input <File> Option

Use the ASCII input file to select the benchmarks. For example, the IMB_SELECT_EXT file looks as follows:

#
# IMB benchmark selection file
#
# Every line must be a comment (beginning with #), or it
# must contain exactly one IMB benchmark name
#
#Window
Unidir_Get
#Unidir_Put
#Bidir_Get
#Bidir_Put
Accumulate

With the help of this file, the following command runs only Unidir_Get and Accumulate benchmarks of the IMB-EXT component:

mpirun .... IMB-EXT -input IMB_SELECT_EXT

-msglen <File> Option

Enter any set of non-negative message lengths to an ASCII file, line by line, and call the Intel® MPI Benchmarks with arguments:

-msglen Lengths

The Lengths value overrides the default message lengths. For IMB-IO, the file defines the I/O portion lengths.

-map PxQ Option

Numbers processes along rows of the matrix:

0

P

...

(Q-2)P

(Q-1)P

1

 

 

 

 

...

 

 

 

 

P-1

2P-1

 

(Q-1)P-1

QP-1

For example, to run Multi-PingPongbetween two nodes of size P, with each process on one node communicating with its counterpart on the other, call:

mpirun -np <2P> IMB-MPI1 -map <P>x2 PingPong

-include [[benchmark1] benchmark2 ...]

Specifies the list of additional benchmarks to run. For example, to add PingPongSpecificSource and PingPingSpecificSource benchmarks, call:

mpirun -np 2 IMB-MPI1 -include PingPongSpecificSource PingPingSpecificSource

-exclude [[benchmark1] benchmark2 ...]

Specifies the list of benchmarks to be exclude from the run. For example, to exclude Alltoall and Allgather, call:

mpirun -np 2 IMB-MPI1 -exclude Alltoall Allgather

-msglog [<minlog>:]<maxlog>

This option allows you to control the lengths of the transfer messages. This setting overrides the MINMSGLOG and MAXMSGLOG values. The new message sizes are 0, 2^minlog, ..., 2^maxlog.

For example, try running the following command line:

mpirun -np 2 IMB-MPI1 -msglog 3:7 PingPong

Intel® MPI Benchmarks selects the lengths 0,8,16,32,64,128, as shown below:

#---------------------------------------------------

# Benchmarking PingPong

# #processes = 2

#---------------------------------------------------

       #bytes #repetitions      t[μsec]   Mbytes/sec

            0         1000         0.70         0.00

            8         1000         0.73        10.46

           16         1000         0.74        20.65

           32         1000         0.94        32.61

           64         1000         0.94        65.14

          128         1000         1.06       115.16

Alternatively, you can specify only the maxlog value:

#---------------------------------------------------

# Benchmarking PingPong

# #processes = 2

#---------------------------------------------------

       #bytes #repetitions      t[μsec]   Mbytes/sec

            0         1000         0.69         0.00

            1         1000         0.72         1.33

            2         1000         0.71         2.69

            4         1000         0.72         5.28

            8         1000         0.73        10.47

-thread_level Option

This option specifies the desired thread level for MPI_Init_thread(). See description of MPI_Init_thread() for details. The option is available only if the Intel® MPI Benchmarks is built with the USE_MPI_INIT_THREAD macro defined. Possible values for <level> are single, funneled, serialized, and multiple.

Submit feedback on this help topic