Intel® Math Kernel Library 11.0.2 User Guide
To run MP LINPACK on multiple nodes, you need to use MPI and modify the HPL.dat file. As an alternative to modifying HPL.dat, you can use new command-line parameters.
To expand MP LINPACK runs to more nodes, perform these steps:
Load the necessary environment variables for Intel MKL, Intel MPI, and the Intel® compiler:
. /opt/intel/composer_xe_2013/bin/compilervars.sh intel64
. /opt/intel/mpi/4.1.0.024/bin64/mpivars.sh
. /opt/intel/composer_xe_2013/mkl/bin/mklvars.sh intel64
$
$ make arch=intel64 version=mkl_hybrid
Change directory to bin/intel64.
$ cd /opt/intel/composer_xe_2013/mkl/benchmarks/mp_linpack/bin/intel64
$ ls
HPL.dat xhpl xhpl_mic
This directory contains files:
xhpl - the Intel® 64 architecture binary.
You can rename this file.
xhpl_mic - the Intel® Many Integrated Core (Intel® MIC) Architecture binary, to be offloaded.
Do not rename this file, otherwise the Intel 64 architecture binary will be unable to find xhpl_mic.
HPL.dat - the HPL input data set.
In the HPL.dat file, set the problem size N to 10000. Because this setting is for a test run, the problem size should be small.
In the HPL.dat file, set the parameters Ps and Qs so that Ps × Qs equals the number of nodes. For example, for 2 nodes, set Ps to 1 and Qs to 2. It is easier to achieve optimal result if Ps = Qs. If equal factors of the number of nodes do not exist, choose them as close to each other as possible so that Ps < Qs.
The resulting HPL.dat file for 2 nodes is as follows:
HPLinpack benchmark input file Innovative Computing Laboratory, University of Tennessee HPL.out output file name (if any) 6 device out (6=stdout,7=stderr,file) 1 # of problems sizes (N) 10000 20000 909000 991000 976800 976800 Ns 1 # of NBs 1280 NBs 1 PMAP process mapping (0=Row-,1=Column-major) 1 # of process grids (P x Q) 1 Ps 2 Qs 16.0 threshold 1 # of panel fact 2 PFACTs (0=left, 1=Crout, 2=Right) 1 # of recursive stopping criterium 4 NBMINs (>= 1) 1 # of panels in recursion 2 NDIVs 1 # of recursive panel fact. 1 RFACTs (0=left, 1=Crout, 2=Right) 1 # of broadcast 0 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) 1 # of lookahead depth 1 DEPTHs (>=0) 0 SWAP (0=bin-exch,1=long,2=mix) 1 swapping threshold 1 L1 in (0=transposed,1=no-transposed) form 1 U in (0=transposed,1=no-transposed) form 0 Equilibration (0=no,1=yes) 8 memory alignment in double (> 0) Alternatively, launch with –n, -p, and –q parameters and leave the HPL.dat file as is.
The resulting HPL.dat file for 4 nodes is as follows:
HPLinpack benchmark input fileHPLinpacInnovative Computing Laboratory, University of Tennessee
HPL.out output file name (if any)
6 device out (6=stdout,7=stderr,file)
1 # of problems sizes (N)
10000 20000 909000 991000 976800 976800 Ns
# of NBs
1280 NBs
1 PMAP process mapping (0=Row-,1=Column-major)
1 # of process grids (P x Q)
2 Ps
2 Qs
16.0 threshold
1 # of panel fact
2 PFACTs (0=left, 1=Crout, 2=Right)
1 # of recursive stopping criterium
4 NBMINs (>= 1)
1 # of panels in recursion
2 NDIVs
1 # of recursive panel fact.
1 RFACTs (0=left, 1=Crout, 2=Right)
1 # of broadcast
0 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1 # of lookahead depth
1 DEPTHs (>=0)
0 SWAP (0=bin-exch,1=long,2=mix)
1 swapping threshold
1 L1 in (0=transposed,1=no-transposed) form
1 U in (0=transposed,1=no-transposed) form
0 Equilibration (0=no,1=yes)
8 memory alignment in double (> 0)
Alternatively, launch with –n, -p, -q parameters and leave the HPL.dat file as is.
Run the xhpl binary under MPI control on two nodes:
mpirun --perhost 1 -n 2 -hosts Node1,Node2 \
-genv MIC_LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH \
-genv LD_LIBRARY_PATH $LD_LIBRARY_PATH ./xhpl
Rerun the HPL test increasing the size of the problem Ns until the matrix size uses about 80% of the available memory or just use the -m command-line parameter.