What's SIMD

Because of the "thermal wall" problem, parallelism becomes the main theme of boosting the performance of modern CPUs, which requires programmers to write explicit parallel code. SIMD (Single Instruction Multiple Data) is an extension of the X86 instruction set that allows you to apply the same operation on multiple data, using one instruction and often within the same CPU cycles.

Consider the following snippet that adds up two 4-dimensional vectors:

void add(float A[4], float B[4], float result[4])
{
    for(int i=0; i<4; i++)
    {
        result[i] = A[i] + B[i];
    }
}

In the above, we use a for-loop to apply addition operation on each pair of elements of A and B as illustrated in figure 1(a).

Observe that the same operation is applied to all pairs of elements, Which is exactly the situation that SIMD could help. Using SIMD, each pair of elements can be added up in parallel using one instruction as shown in figure 1(b), which yields 4x faster than the serial method (actually even more than 4x faster, because the for-loop is eliminated)

Applying the same operation on multiple data is frequently seen in multi-media applications. For example, to brighten an image is essentially multiplying the intensity of each pixel of the image by some factor, where the same multiplication is applied on all the pixels. In fact, SIMD is wildly used to speed up image processing, scientific computing, neural networks, and even database systems.

Last updated