📔
[Lab] SIMD Lab
  • Outline
  • Introduction
    • What's SIMD
    • CPU support for SIMD
  • SIMD in C/C++
    • Get started
    • Data types
    • Intrinsics
    • Resources and References
Powered by GitBook
On this page

Was this helpful?

  1. SIMD in C/C++

Get started

There are several ways to use SIMD in your C/C++ program. The most direct way is to embed SIMD assembly in your code, which is apparently cumbersome and error-prone. You can also let the compiler automatically optimize your code with SIMD by using openMP (we will cover openMP later in this course). For example:

#pragma omp simd
for(i=0;i<N;i++)
{
    C[i] = A[i] + B[i];
}

However, the code patterns that can be automatically detected and optimized by Compiler are limited.

SIMD intrinsics are convenient if you want to manually optimize your code using SIMD without touching assembly. It is basically a library that wraps SIMD assembly as C functions. Let's re-write our vector addition example by SIMD intrinsics.

add_simd.c
#include <xmmintrin.h> //header file for sse
#include <stdio.h>

void add_simd(float a[4], float b[4], float result[4])
{
	__m128 v1 = _mm_load_ps(a); // load float[4] as __m128 vector
	__m128 v2 = _mm_load_ps(b);
	__m128 v3 = _mm_add_ps(v1, v2); // add them up
	_mm_store_ps(result, v3); // store the result to float[4]
}

int main()
{
	float a[4] = {1, 2, 3, 4};
	float b[4] = {2, 3, 4, 5};
	float r[4];
	add_simd(a, b, r);
	printf("%f %f %f %f\n", r[0], r[1], r[2], r[3]);
}

Before we explain the details of the code, let's compile and run the program first. To compile the above code with SSE enabled, run the following code:

gcc add_simd.c -o add_simd -msse

Run the compiled executable, you can see the result is as expected:

PreviousCPU support for SIMDNextData types

Last updated 3 years ago

Was this helpful?

Notice that before using SIMD intrinsics, you have to include a header file that contains the intrinsic you want. To find out which header file is required, you can refer to . For example, xmmintrin.h is required for __mm_add_ps:

Notice that we should add an additional flag "-m[CPUID Flag]"("-msse" in the above code snippet) when compiling your code with GCC Compiler. This flag is used to enable SSE instructions. Different instructions need different flags, you can refer to to find the corresponding flag. For example:

Intel Intrinsic Guide
Intel Intrinsic Guide