Get started

There are several ways to use SIMD in your C/C++ program. The most direct way is to embed SIMD assembly in your code, which is apparently cumbersome and error-prone. You can also let the compiler automatically optimize your code with SIMD by using openMP (we will cover openMP later in this course). For example:

#pragma omp simd
for(i=0;i<N;i++)
{
    C[i] = A[i] + B[i];
}

However, the code patterns that can be automatically detected and optimized by Compiler are limited.

SIMD intrinsics are convenient if you want to manually optimize your code using SIMD without touching assembly. It is basically a library that wraps SIMD assembly as C functions. Let's re-write our vector addition example by SIMD intrinsics.

add_simd.c
#include <xmmintrin.h> //header file for sse
#include <stdio.h>

void add_simd(float a[4], float b[4], float result[4])
{
	__m128 v1 = _mm_load_ps(a); // load float[4] as __m128 vector
	__m128 v2 = _mm_load_ps(b);
	__m128 v3 = _mm_add_ps(v1, v2); // add them up
	_mm_store_ps(result, v3); // store the result to float[4]
}

int main()
{
	float a[4] = {1, 2, 3, 4};
	float b[4] = {2, 3, 4, 5};
	float r[4];
	add_simd(a, b, r);
	printf("%f %f %f %f\n", r[0], r[1], r[2], r[3]);
}

Notice that before using SIMD intrinsics, you have to include a header file that contains the intrinsic you want. To find out which header file is required, you can refer to Intel Intrinsic Guide. For example, xmmintrin.h is required for __mm_add_ps:

Before we explain the details of the code, let's compile and run the program first. To compile the above code with SSE enabled, run the following code:

gcc add_simd.c -o add_simd -msse

Notice that we should add an additional flag "-m[CPUID Flag]"("-msse" in the above code snippet) when compiling your code with GCC Compiler. This flag is used to enable SSE instructions. Different instructions need different flags, you can refer to Intel Intrinsic Guide to find the corresponding flag. For example:

Run the compiled executable, you can see the result is as expected:

Last updated