Data types

In our vector addition example:

add_simd.c
void add_simd(float a[4], float b[4], float result[4])
{
	__m128 v1 = _mm_load_ps(a);
	__m128 v2 = _mm_load_ps(b);
	__m128 v3 = _mm_add_ps(v1, v2);
	_mm_store_ps(result, v3);
}

You may have noticed that before we actually add the vectors up, we need to store the float[4] inputs into a variable with __m128 datatype using _mm_load_ps.

__m128 represents a vector containing 128 bits of data. It could represent 4 32-bit floats as in our example, it could also represent 4 32-bit integers, or 8 16-bit integers, or even 2 64-bit doubles. Actually, it corresponds to a SIMD register in your CPU. How a __m128 variable is interpreted depends on the intrinsic that operates on that variable. For example, _mm_add_ps interpret __m128 as 4 32-bit floats. Notice that there are also other types with different lengths such as __m64.

_mm_load_ps and _mm_store_ps are convenient to convert float arrays to/from __m128. you can find other load/store functions in Intel intrinsic Guide which are useful for load/store other C types:

A trick to avoid using load and store

It is sometimes quite cumbersome to covert a normal C type array back and forth to SIMD types (although load and store functions won't actually add to performance overhead because gcc compiler avoids non-necessary memory copy for you). We can use union type to simplify our code:

add_simd_with_union_type.c
#include <xmmintrin.h> //header file for sse
#include <stdio.h>

union Data{
    __m128 v;
    float a[4];
};

int main()
{
        union Data a = {1,2,3,4};
        union Data b = {2,3,4,5};
        union Data r;
        r.v = _mm_add_ps(a.v,b.v);
        printf("%f %f %f %f\n", r.a[0], r.a[1], r.a[2], r.a[3]);
}

Using the union type, r.v and r.a is of different data types but refer to the same memory address. We can conveniently use r.v as parameters of the intrinsic and r.a in normal C functions.

Last updated