Intel® Math Kernel Library 11.0.2 User Guide

Expanding to Two or More Nodes

To run MP LINPACK on multiple nodes, you need to use MPI and modify the HPL.dat file. As an alternative to modifying HPL.dat, you can use new command-line parameters.

To expand MP LINPACK runs to more nodes, perform these steps:

  1. Load the necessary environment variables for Intel MKL, Intel MPI, and the Intel® compiler:

    . /opt/intel/composer_xe_2013/bin/compilervars.sh intel64

    . /opt/intel/mpi/4.1.0.024/bin64/mpivars.sh

    . /opt/intel/composer_xe_2013/mkl/bin/mklvars.sh intel64

    $

    $ make arch=intel64 version=mkl_hybrid

  2. Change directory to bin/intel64.

    $ cd /opt/intel/composer_xe_2013/mkl/benchmarks/mp_linpack/bin/intel64

    $ ls

    HPL.dat xhpl xhpl_mic

    This directory contains files:

    • xhpl - the Intel® 64 architecture binary.

      You can rename this file.

    • xhpl_mic - the Intel® Many Integrated Core (Intel® MIC) Architecture binary, to be offloaded.

      Do not rename this file, otherwise the Intel 64 architecture binary will be unable to find xhpl_mic.

    • HPL.dat - the HPL input data set.

  3. In the HPL.dat file, set the problem size N to 10000. Because this setting is for a test run, the problem size should be small.

  4. In the HPL.dat file, set the parameters Ps and Qs so that Ps × Qs equals the number of nodes. For example, for 2 nodes, set Ps to 1 and Qs to 2. It is easier to achieve optimal result if Ps = Qs. If equal factors of the number of nodes do not exist, choose them as close to each other as possible so that Ps < Qs.

    The resulting HPL.dat file for 2 nodes is as follows:

    HPLinpack benchmark input file
    Innovative Computing Laboratory, University of Tennessee
    HPL.out      output file name (if any)
    6            device out (6=stdout,7=stderr,file)
    1            # of problems sizes (N)
    10000 20000 909000 991000 976800 976800 Ns
    1            # of NBs
    1280         NBs
    1            PMAP process mapping (0=Row-,1=Column-major)
    1            # of process grids (P x Q)
    1            Ps
    2            Qs
    16.0         threshold
    1            # of panel fact
    2            PFACTs (0=left, 1=Crout, 2=Right)
    1            # of recursive stopping criterium
    4            NBMINs (>= 1)
    1            # of panels in recursion
    2            NDIVs
    1            # of recursive panel fact.
    1            RFACTs (0=left, 1=Crout, 2=Right)
    1            # of broadcast
    0            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
    1            # of lookahead depth
    1            DEPTHs (>=0)
    0            SWAP (0=bin-exch,1=long,2=mix)
    1            swapping threshold
    1            L1 in (0=transposed,1=no-transposed) form
    1            U  in (0=transposed,1=no-transposed) form
    0            Equilibration (0=no,1=yes)
    8            memory alignment in double (> 0)
    
    Alternatively, launch with –n, -p, and –q parameters and leave the HPL.dat file as is.
    

    The resulting HPL.dat file for 4 nodes is as follows:

    HPLinpack benchmark input fileHPLinpacInnovative Computing Laboratory, University of Tennessee
    HPL.out      output file name (if any)
    6            device out (6=stdout,7=stderr,file)
    1            # of problems sizes (N)
    10000 20000 909000 991000 976800 976800 Ns
                # of NBs
    1280         NBs
    1            PMAP process mapping (0=Row-,1=Column-major)
    1            # of process grids (P x Q)
    2            Ps
    2            Qs
    16.0         threshold
    1            # of panel fact
    2            PFACTs (0=left, 1=Crout, 2=Right)
    1            # of recursive stopping criterium
    4            NBMINs (>= 1)
    1            # of panels in recursion
    2            NDIVs
    1            # of recursive panel fact.
    1            RFACTs (0=left, 1=Crout, 2=Right)
    1            # of broadcast
    0            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
    1            # of lookahead depth
    1            DEPTHs (>=0)
    0            SWAP (0=bin-exch,1=long,2=mix)
    1            swapping threshold
    1            L1 in (0=transposed,1=no-transposed) form
    1            U  in (0=transposed,1=no-transposed) form
    0            Equilibration (0=no,1=yes)
    8            memory alignment in double (> 0)
    
    Alternatively, launch with –n, -p, -q parameters and leave the HPL.dat file as is.
    
  5. Run the xhpl binary under MPI control on two nodes:

    mpirun --perhost 1 -n 2 -hosts Node1,Node2 \

    -genv MIC_LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH \

    -genv LD_LIBRARY_PATH $LD_LIBRARY_PATH ./xhpl

  6. Rerun the HPL test increasing the size of the problem Ns until the matrix size uses about 80% of the available memory or just use the -m command-line parameter.