Intel® C++ Compiler XE 13.1 User and Reference Guides
This topic only applies to Intel® Many Integrated Core Architecture (Intel® MIC Architecture).
You can meaure both the amount of time it takes to execute an offload region of code, as well as the amount of data transferred during the execution of the offload region.
Use one of the following mechanisms:
Set the OFFLOAD_REPORT environment variable.
Use the __Offload_report API.