Intel® C++ Compiler XE 13.1 User and Reference Guides
Following the guidelines below will help autovectorization of the loop.
The special __m64, __m128, and __m256 datatypes are not vectorizable. The loop body cannot contain any function calls. Use of the Intel® Streaming SIMD Extensions and Intel® Advanced Vector Extensions intrinsics (for example, mm_add_ps) is not allowed.