Intrinsics

In our vector addition example:

add_simd.c
void add_simd(float a[4], float b[4], float result[4])
{
	__m128 v1 = _mm_load_ps(a);
	__m128 v2 = _mm_load_ps(b);
	__m128 v3 = _mm_add_ps(v1, v2);
	_mm_store_ps(result, v3);
}

The 5th line does the actual work of performing the addition operation. Intel Intrinsic Guide describes the functionality of this intrinsic clearly in pseudo-code:

Basically, the pseudo-code tells us _mm_add_ps does exactly as illustrated in the following:

Note that the pseudo-code is only a logical description of what the intrinsic does but not how it actually works. For simplicity, the pseudo-code is written as a serial program, but actually the iterations of for-loop run in parallel.

For now, we only introduce _mm_add_ps, but there are plenty of intrinsics that provide various arithmetic operations on different types of data. In practice, you need to find the one that best fits your needs. To do so, play with Intel Intrinsic Guide, it is super useful and convenient for SIMD programming! 😆

Last updated