Intel® C++ Compiler XE 13.1 User and Reference Guides

offload_transfer

Initiates asynchronous data transfer, or initiates and completes synchronous data transfer. This pragma only applies to Intel® MIC Architecture.

Syntax

#pragma offload_transfer specifier[ specifier...]

Where specifier can be any of the following:

The following are arguments to use in specifier:

Arguments

target-name

An identifier that represents the target. The only allowable target name is mic.

target-number

Required for signal and wait clauses.

An integer expression whose value is interpreted as follows:

>=0

A value greater than or equal to zero specifies execution on a specific coprocessor. The number of the specific coprocessor is determined as follows:

coprocessor=target-number % number_of_coprocs

If the correct target hardware needed to run the offloaded program is not available on the system, the program fails with an error message.

<= -1

These values are reserved.

If you don't specify this argument, the runtime system chooses whether to execute the code on the coprocessor or the CPU, and if multiple coprocessors are available, on which coprocessor.

For example, in a system with four targets:

  • specifying 2 or 6 tells the runtime systems to execute the code on target 2, because both 2 % 4 and 6 % 4 equal 2.

  • Specifying 1000 tells the runtime systems to execute the code on target 0, because 1000 % 4 = 0.

if-specifier

A Boolean expression.

If the expression evaluates to true, then the data transfer specified by the pragma occurs. If the specified target coprocessor is absent from the system or not available at that time because it is fully loaded, then no action is taken.

If the expression evaluates to false, then no action is taken and none of the other offload clauses have any effect.

If the expression evaluates to false and you use either the signal or wait clause in this pragma, then the behavior is undefined.

Note

Do not use this clause and a mandatory clause in the same directive.

signal

An optional integer expression that serves as a handle on an asynchronous data transfer or computational activity. The computation performed by the offload clause and any results returned from the offload using out clauses occurs concurrently with CPU execution of the code after the pragma. If this clause is not used, then the entire offload and associated data transfer are executed synchronously. The CPU will not continue past the pragma until it has completed.

This clause refers to a specific target device so you must specify a target-number in the target clause that is greater than or equal to zero.

wait

An optional integer expression to specify a wait for the completion of a previously initiated asynchronous data transfer or asynchronous computation.

This clause refers to a specific target device so you must specify a target-number in the target clause that is greater than or equal to zero.

Querying a signal before the signal has been initiated results in undefined behavior and a runtime abort of the application. For example, querying a signal on target:0 that was initiated for target:1 results in a runtime abort of the application because the signal was initiated for target:1, so there is no signal associated with target:0.

mandatory

An optional clause to specify execution on the target is required. Execution on the CPU is not allowed. If the correct target hardware needed to run the offloaded program is not available on the system, the program fails with an error message.

Note

Do not use this clause and the if-specifier clause in the same directive.

offload-parameter

Is one of the following:

  • in ( variable-ref [, variable-ref ] [ modifier[ modifier ] ] )

  • out ( variable-ref [, variable-ref ] [ modifier[ modifier ] ] )

  • nocopy ( variable-ref [, variable-ref ] [ modifier[ modifier ] ] )

When a program runs in a heterogeneous environment, program variables are copied back and forth between CPU and the target. The offload-parameter is a specification for controlling the direction in which variables are copied, and for pointers, the amount of data that is copied.

in

A variable is copied from CPU to the coprocessor.

out

A variable is copied from the coprocessor to the CPU.

nocopy

Memory is allocated on the coprocessor for the variable.

An in or out element-count-expr expression (see description below within modifier) is evaluated at a point in the program before the statement or clause in which it is used.

An array variable whose size is known from the declaration is copied in its entirety. If a subset of an array is to be processed, use a pointer to the starting element of the subset and the element-count-expr to transfer the array subset.

variable-ref

Is one of the following:

  • a C/C++ identifier.

  • variable-ref . identifier

  • array-slice

array-slice

variable-ref '[' integral-expression [ : integral-expression ] ']'

An array-slice is an array expression that denotes one contiguous set of array elements.

modifier

Is one of the following:

  • length ( element-count-expr )

    where element-count-expr is an integral expression, computed at runtime. Use it with:

    • Pointer variables.

      Pointer variable values themselves are never copied across the host/target interface because there is no correspondence between the memory addresses of the host CPU and the target. Instead, objects that a pointer points to are copied to or from the target, and the value of the pointer variable is recreated. By default a single element is copied.

      You can use element-count-expr to specify how many elements of the pointer type should be considered as the data the pointer points to. If the expression value is zero or negative, a runtime error occurs.

    • Variable-length arrays.

      element-count-expr specifies a number of elements copied between the CPU and target.

  • alloc_if ( condition ) | free_if (condition ) where condition is a Boolean expression.

    alloc_if specifies a Boolean condition that controls whether the allocatable variables in the in clause will be allocated a new block of memory on the target when the offload is executed on the target. If the expression evaluates to true, a new memory allocation is performed for each variable listed in the clause. If the condition evaluates to false, the existing allocated values on the target are reused (data persistence). You must ensure that a block of memory of sufficient size has been previously allocated for the variables on the target by using a free_if (0) clause on an earlier offload.

    free_if specifies a Boolean condition that controls whether to deallocate the memory allocated for the allocatable variables in an in clause. If the expression evaluates to true, the memory pointed to by each variable listed in the clause is deallocated. If the condition evaluates to false, no action is taken on the memory pointed to by the variables in the list. A subsequent clause will be able to reuse the allocated memory (data persistence).

    The following are the default settings for the alloc_if and free_if modifiers:

    alloc_if

    free_i f

    in

    true

    true

    inout

    true

    true

    out

    true

    true

    nocopy

    false

    false

    See Managing Memory Allocation for Pointer Variables for more information.

  • align (expression) where the value of expression should be a power of two. This modifier applies to pointer variables and requests the specified minimum alignment for pointer data allocated on Intel® MIC Architecture.

  • alloc (array-slice) where array-slice specifies a set of elements of the array that need allocation. Data specified by the in/out expression is transferred into the corresponding section of the array alllocated on the coprocessor. For more information, see Allocating Memory for Parts of Arrays.

  • into (var-exp) where var-exp is a variable expression. The into modifier allows data to be transferred from one variable on the CPU to another on the coprocessor, and vice versa. Only one item is allowed in variable-ref when using the into modifier. For more information, see Moving Data from One Variable to Another.

Description

offload_transfer initiates asynchronous data transfer. It also initiates and completes synchronous data transfer.

Example

The following example demonstrates using signal and wait as clauses of two different pragmas to receive data asynchronously from the coprocessor to the CPU. The first offload performs the computation but only initiates data transfer. The second pragma causes a wait for the data transfer to complete.

01   const int N = 4086;
02   float *f1, *f2;
03   f1 = (float *)memalign(64, N*sizeof(float)); 
04   f2 = (float *)memalign(64, N*sizeof(float));
...

10   // CPU sends f1 as input synchronously
11   // The output is in f2, but is not needed immediately
12   #pragma offload target (mic:0) signal(f2) \
13                          in(  f1 : length(N) ) \
14                          nocopy( f2 : length(N) ) signal(f2)
15   {
16        foo(N, f1, f2);
17   }
..
20   #pragma offload_transfer (mic:0) wait(f2) \
                     out( f2 : length(N) alloc_if(0) free_if(1))
21   
22   // CPU can now use the result in f2
23 

See Also


Submit feedback on this help topic