Intel® C++ Compiler XE 13.1 User and Reference Guides
This topic only applies to Intel® Many Integrated Core Architecture (Intel® MIC Architecture).
By default, the offload pragma causes the CPU thread that encounters the pragma to wait for completion of the offload before continuing to the next statement. You can execute an asynchronous offload computation, which enables the CPU to initiate the offload and immediately continue to the next statement.
To specify an asynchronous offloaded computation, specify a signal clause in the offload pragma to initiate the computation, and subsequently use the offload_wait pragma to wait for completion of the offloaded computation.
The signal and wait clauses and the offload_wait construct refer to a specific target device, so you must specify target-number in the target() clause.
Querying a signal before the signal has been initiated results in undefined behavior, and a runtime abort of the application. For example, consider a query of a signal (SIG1) on target device 0, where the signal was actually initiated for target device 1. The signal was initiated for target device 1, so there is no signal (SIG1) associated with target device 0, and therefore the application aborts.
The following example enables the CPU to issue offloaded computations and continue concurrent activity without using any additional CPU threads:
char signal_var;
do {
#pragma offload target (mic:0) signal(&signal_var)
{
long_running_mic_compute();
}
concurrent_cpu_activity();
#pragma offload_wait target (mic:0) (&signal_var)
} while (1);