Intel® C++ Compiler XE 13.1 User and Reference Guides

offload_wait

Specifies a wait for a previously initiated asynchronous activity. This pragma only applies to Intel® MIC Architecture.

Syntax

#pragma offload_wait specifier[, specifier...]

Where specifier can be any of the following:

The following are arguments to use in specifier.

Arguments

target-name

An identifier that represents the target. The only allowable target name is mic.

target-number

Required for signal and wait clauses.

If you don't specify this argument, the runtime system will wait until all the tags are signaled by any coprocessor.

An integer expression whose value is interpreted as follows:

>=0

A value greater than or equal to zero specifies execution on a specific coprocessor. The number of the specific coprocessor is determined as follows:

coprocessor=target-number % number_of_coprocs

If the correct target hardware needed to run the offloaded program is not available on the system, the program fails with an error message.

<= -1

These values are reserved.

For example, in a system with four targets:

  • Specifying 2 or 6 tells the runtime systems to wait for coprocessor 2 to signal the tags, because both 2 % 4 and 6 % 4 equal 2.

  • Specifying 1000 tells the runtime systems wait for coprocessor 0 to signal the tags, because 1000 % 4 = 0.

if-specifier

A Boolean expression.

If the expression evaluates to true, then the runtime will wait until the tags are signaled.

If the expression evaluates to false, then the wait clause is ignored.

Use the same if-specifier expression that you used to start the asynchronous computation or data transfer with offload or offload_transfer.

wait

A mandatory expression to specify a wait for the completion of a previously initiated asynchronous data transfer or asynchronous computations.

This clause refers to a specific target device so you must specify a target-number in the target clause that is greater than or equal to zero.

Querying a signal before the signal has been initiated results in undefined behavior and a runtime abort of the application. For example, querying a signal on target:0 that was initiated for target:1 results in a runtime abort of the application because the signal was initiated for target:1, so there is no signal associated with target:0.

Description

This directive specifies a wait for the completion of a previously initiated asynchronous data transfer done by offload_transfer, or an asynchronous computation and return data transfer, if any, done by offload.

Example

The following example double buffers inputs to an offload.

#pragma offload_attribute(push, target(mic))
int count = 25000000;
int iter = 10;
float *in1, *out1;
float *in2, *out2;
#pragma offload_attribute(pop)


void do_async_in()
{
      int i;
      #pragma offload_transfer target(mic:0) in(in1 : length(count) alloc_if(0) free_if(0) ) signal(in1)
      for (i=0; i<iter; i++)
      {
            if (i%2 == 0) {
                  #pragma offload_transfer target(mic:0) if(i!=iter-1) in(in2 : length(count) alloc_if(0) free_if(0) ) signal(in2)
                  #pragma offload target(mic:0) nocopy(in1) wait(in1) out(out1 : length(count) alloc_if(0) free_if(0) )
                  compute(in1, out1);
            } else {
                  #pragma offload_transfer target(mic:0) if(i!=iter-1) in(in1 : length(count) alloc_if(0) free_if(0) ) signal(in1)
                  #pragma offload target(mic:0) nocopy(in2) wait(in2) out(out2 : length(count) alloc_if(0) free_if(0) )
                  compute(in2, out2);
            }
      }
}

In this example the output results of an offload are double-buffered:

#pragma offload_attribute(push, target(mic))
int count = 25000000;
int iter = 10;
float *in1, *out1;
float *in2, *out2;
#pragma offload_attribute(pop)

void do_async_out()
{
      int i;
      for (i=0; i<iter+1; i++)
      {
            if (i%2 == 0) {
                  if (i<iter) {
                        #pragma offload target(mic:0) in(in1 : length(count) alloc_if(0) free_if(0) ) nocopy(out1)
                        compute(in1, out1);
                        #pragma offload_transfer target(mic:0) out(out1 : length(count) alloc_if(0) free_if(0) ) signal(out1)
                  }
                  if (i>0) {
                        #pragma offload_wait target(mic:0) wait(out2)
                        use_result(out2);
                  }
            } else {
                  if (i<iter) {
                        #pragma offload target(mic:0) in(in2 : length(count) alloc_if(0) free_if(0) ) nocopy(out2)
                        compute(in2, out2);
                        #pragma offload_transfer target(mic:0) out(out2 : length(count) alloc_if(0) free_if(0) ) signal(out2)
                  }
                  #pragma offload_wait target(mic:0) wait(out1)
                  use_result(out1);
            }
      }
}

See Also


Submit feedback on this help topic