OFFLOAD Compiler Directive: Causes the statement following the directive to execute on the target. This directive only applies to Intel® MIC Architecture.
!DIR$ [OMP] OFFLOAD specifier[[,] specifier...]
specifier |
Can be any of the following:
You cannot specify both MANDATORY and IF. |
||||||||
target-name |
Is an identifier that represents the target. The only allowable target name is MIC. |
||||||||
target-number |
(Optional) Is an integer expression whose value is interpreted as follows:
If you don't specify this argument, the runtime system chooses whether to execute the code on the CPU or the coprocessor, and if multiple coprocessors are available, on which coprocessor. For example, in a system with 4 coprocessors:
注Leaving data values on the coprocessor from one execution of offloaded code to another is called "data persistence". In a system with multiple coprocessors, you need to specify a target-number to reliably use data persistence. When you use ALLOC_IF or FREE_IF to implement data persistence on the coprocessor, but do not specify a target-number, the runtime system randomly chooses a coprocessor, so the chosen coprocessor could be one on which the data is not available. |
||||||||
if-specifier |
Is a Boolean expression. If the expression evaluates to true, then the program attempts to offload the statement. If the specified target coprocessor is absent from the system or not available at that time because it is fully loaded, then the offloaded code executes on the CPU. If the expression evaluates to false, then the offloaded code executes on the CPU and none of the other OFFLOAD clauses have any effect. |
||||||||
tag |
Is an integer expression. When used with SIGNAL, this integer expression is a memory reference and is associated with an asynchronous computation or an asynchronous data transfer. This expression can be used in subsequent WAIT clauses in other OFFLOAD directives. When used with WAIT, this integer expression is asynchronous input data that is a memory reference. Use the same value that you specified in the SIGNAL specifier to start the asynchronous computation or data transfer with the OFFLOAD or OFFLOAD_TRANSFER directive. |
||||||||
offload-parameter |
Can be any of the following clauses:
When a program runs in a heterogeneous environment, program variables are copied back and forth between the CPU and the target. The offload-parameter is a specification for controlling the direction in which variables are copied, and for pointers, the amount of data that is copied.
An IN or OUT element-count-expr expression (see description below within modifier) is evaluated at a point in the program before the statement or clause in which it is used. An array variable whose size is known from its declaration is copied in its entirety. If a subset of an array is to be processed, use the name of the starting element of the subset and the element-count-expr to transfer the array subset. Because a data pointer variable not listed in an IN clause is uninitialized within the offload region, it must be assigned a value on the target before it can be referenced. |
||||||||
identifier |
Is a variable, a subscripted variable, an array slice, or a component reference. |
||||||||
modifier |
Is one of the following:
|
The OFFLOAD directive both transfers data and offloads computation.
The OMP is optional in the syntax. When it is present, the next line, other than a comment, must be an OpenMP* PARALLEL, PARALLEL SECTIONS, or PARALLEL DO directive. Otherwise the compiler issues an error.
When OMP is not present in the syntax, the OFFLOAD directive must be followed by one of the following or the compiler issues an error:
An OpenMP* PARALLEL, PARALLEL SECTIONS, or PARALLEL DO directive
This specifies remote execution of that top-level OpenMP* construct.
A CALL statement
This specifies remote execution of that single procedure call.
An assignment statement where the right side only calls a function
This specifies remote execution of that single function invocation.
You can choose whether to offload a statement based on runtime conditions, such as the size of a data set. The IF (if-specifier) clause lets you specify the condition.
Do not use the __MIC__ preprocessor macro inside a statement following an OMP OFFLOAD directive. However, you can use it in a subprogram called from the directive.
Conceptually, this is the sequence of events when a statement marked for offload is encountered:
If there is no IF clause, go to step 3.
On the host CPU, evaluate the IF expression. If it evaluates to true, go to step 3. Otherwise, execute the region on the host CPU and go to step 14.
Attempt to acquire the target. If successful, go to step 4. Otherwise, execute the region on the host CPU and go to step 14.
On the host CPU, compute all ALLOC_IF, FREE_IF, and element-count-expr expressions used in IN and OUT clauses.
On the host CPU, gather all variable values that are inputs to the offload.
Send the input values from the host CPU to the target.
On the target, allocate memory for variable-length OUT variables.
On the target, copy input values into corresponding target variables.
On the target, execute the offloaded region.
On the target, compute all element-count-expr expressions used in OUT clauses.
On the target, gather all variable values that are outputs of the offload.
Send output values back from the target to the host CPU.
On the host CPU, copy values received into corresponding host CPU variables.
Continue processing the program on the host CPU.
The following example demonstrates offloading a CALL statement or assignment statement. Note that !DIR$ OFFLOAD TARGET (MIC) prefixes the statement designated for offload.
! Offload call of routine calc
!DIR$ OFFLOAD TARGET(MIC)
CALL calc(...)
! Offload call of function recalc
!DIR$ OFFLOAD TARGET(MIC)
X = recalc(...)
The following example demonstrates using the OFFLOAD directive in conjunction with the OpenMP* PARALLEL directive to specify remote execution of the OpenMP construct.
! Offload OpenMP parallel construct
!DIR$ OMP OFFLOAD TARGET(MIC)
!$omp parallel
...
!$omp end parallel
The following example demonstrates how to use a variable-length array to specify a number of elements copied between the CPU and target.
subroutine sample (Z,N,M)
integer, intent(in) :: N,M
real, dimension (N,*) :: Z
...
!dir$ omp offload target(mic) in (Z:length(N*M))
...
end subroutine sample
The following example shows various forms of identifier and use of the ALLOC and INTO modifiers in IN clauses:
subroutine foo
real a(1000,500), b(1000,500), c(2000, 20)
real, pointer :: p(:)
p => c(1:20:2)
!dec$ offload target(mic) in( a : into (b) )
...
!dec$ offload target(mic) in( c(i:j:k,l:m:n) ) ! k and n must be strides of 1
...
!dec$ offload target(mic) in( p(1:20) : alloc(1:100) )
...
end
The following example demonstrates using the OFFLOAD directive, as well as directives OFFLOAD_TRANSFER and OFFLOAD_WAIT.
! Fortran free form async_sw.f90
module M
integer, parameter :: iter = 10
integer,parameter :: count = 25000
!dir$ options /offload_attribute_target=mic
real(4), allocatable :: in1(:), in2(:), out1(:), out2(:)
integer :: sin1, sin2, sout1, sout2
!dec$ end options
contains
subroutine compute(x, y)
!dir$ attributes offload:mic ::compute
real(4), allocatable :: x(:), y(:)
integer i
!dir$ omp parallel do num_threads(96) private(i)
do i=1 ,count
y(i) = x(i) * x(i)
end do
end subroutine compute
subroutine do_async_in()
integer i
!dir$ offload_transfer target(mic:0) &
& in(in1 : alloc_if(.false.) free_if(.false.) ) signal(sin1)
do i = 0, (iter - 1)
if (mod(i,2) == 0) then
!dir$ offload_transfer target(mic:0) if(i/=iter-1) &
& in(in2 : alloc_if(.false.) free_if(.false.) ) signal(sin2)
!dir$ offload target(mic:0) nocopy(in1) wait(sin1) &
& out(out1 : length(count) alloc_if(.false.) free_if(.false.) )
call compute(in1, out1);
else
!dir$ offload_transfer target(mic:0) if(i/=iter-1) &
& in(in1 : alloc_if(.false.) free_if(.false.) ) signal(sin1)
!dir$ offload target(mic:0) nocopy(in2) wait(sin2) &
& out(out2 : length(count) alloc_if(.false.) free_if(.false.) )
call compute(in2, out2);
endif
enddo
end subroutine do_async_in
subroutine do_async_out()
integer i
do i=0, iter
if (mod(i,2) == 0) then
if (i < iter) then
!dir$ offload target(mic:0) in(in1 : alloc_if(.false.) free_if(.false.) ) &
& nocopy(out1: length(count) alloc_if(.false.) free_if(.false.))
call compute(in1, out1)
!dir$ offload_transfer target(mic:0) out(out1 : alloc_if(.false.) free_if(.false.) ) &
& signal(sout1)
endif
if (i > 0) then
!dir$ offload_wait target(mic:0) wait(sout2)
call use_result(out2)
endif
else
if (i<iter) then
!dir$ offload target(mic:0) in(in2 : length(count) alloc_if(.false.) free_if(.false.) ) &
& nocopy(out2 : alloc_if(.false.) free_if(.false.))
call compute(in2, out2)
!dir$ offload_transfer target(mic:0) out(out2 : alloc_if(.false.) free_if(.false.) ) &
& signal(sout2)
endif
!dir$ offload_wait target(mic:0) wait(sout1)
call use_result(out1);
endif
enddo
end subroutine do_async_out
subroutine do_sync()
integer i
do i=0, iter-1
!dir$ offload target(mic:0) &
& in(in1 : alloc_if(.false.) free_if(.false.) ) &
& out(out1 : alloc_if(.false.) free_if(.false.) )
call compute(in1, out1)
enddo
end subroutine do_sync
subroutine use_result(x)
real(4), allocatable :: x(:)
print*, "USE_RESULT *****************"
end subroutine use_result
end module M
program main
use M
integer i
allocate (in1(count), in2(count), out1(count), out2(count))
do i=1,count
in1(i) = i
in2(i) = i
enddo
!dir$ offload_transfer target(mic:0) &
& nocopy(in1, out1, in2, out2 : alloc_if(.true.) free_if(.false.) )
!dir$ omp parallel do num_threads(96) private(i)
do i=1 ,count
out1(i) = 0.
out2(i) = 0.
enddo
call do_sync();
call do_async_in();
call do_async_out();
!dir$ offload_transfer target(mic:0) &
& nocopy(in1, out1, in2, out2 : length(count) alloc_if(.false.) free_if(.true.) )
deallocate( in1,in2,out1,out2)
end