Intel® Math Kernel Library 11.0.2 User Guide
To run the Automatic Offload MP LINPACK binary on a cluster node, do the following:
Set up environment variables for Intel MKL Automatic Offload mode.
The values must meet these requirements:
The values of MKL_NUM_THREADS must equal the number of physical cores on the Host.
Do not count logical cores made available by the Intel® Hyper-Threading Technology.
The value of MIC_OMP_NUM_THREADS must equal 4 times the number of cores on the Intel Xeon Phi coprocessor minus 4. This keeps 1 core for the uOS. For an Intel Xeon Phi coprocessor with 61 cores this yields 240 = (61 * 4 – 4).
proclist of MIC_KMP_AFFINITY must start at 1 and reach the calculated value of MIC_OMP_NUM_THREADS, giving proclist=[1-240:1] in the above example.
Set up the environment variables using these commands (assuming the bash shell):
$ unset MKL_MIC_0_WORKDIVISION
$ unset MKL_MIC_1_WORKDIVISION
$ export MKL_NUM_THREADS=16
$ export MKL_MIC_MAX_MEMORY=4G
$ export MKL_MIC_ENABLE=1
$ export MKL_DYNAMIC=FALSE
$ export MIC_USE_2MB_BUFFERS=16K
$ export MIC_OMP_NUM_THREADS=240
$ export MIC_KMP_AFFINITY=explicit,granularity=fine,proclist=[1-240:1]
Execute the binary:
$ ./xhpl
Modify the HPL.dat file to match the memory on the Host by increasing the value in line 6 before Ns:
For 16 GB memory on a single HOST: 12000 Ns
For 32 GB memory on a single HOST: 56000 Ns
For 64 GB memory on a single HOST: 83000 Ns
In general, you can compute the memory required to store the matrix (which does not count numerous buffers) as 8 bytes * N * N / (P * Q), where N is the problem size, and P and Q are the process grids in HPL.dat.
$ ./xhpl