Parallel

The first way to create a thread is a little bit similar to that in pThread: you can create some numbers of threads, and you can assign each thread a function to run.

Here just look at the parallel region in OpenMP as the function you want each thread to run.

  • Just add a compiler directive statement before the parallel region.

  • Tell compiler how many threads you want to have on the parallel region (or not tell then by default it is cores number)

The helloworld.c you have run is an example.

  • #pragma omp parallel private(tid)

    • this is a compiler directive statement, which you should always write before your parallel region. To tell the compiler "hey, could you please help me to parallel the following code region?". Then OpenMP will create some threads as your instruction.

This is a basic way to create threads and parallel your program, you can define what your thread will do in this way.

OpenMP also has a directive statement to parallel theforloop.

#include <omp.h>
#include <stdio.h>
int main(int argc, char *argv[]) {

    int nthreads=4;
    omp_set_num_threads(nthreads);
    int i;
    /* Fork a team of threads with each thread having a private tid variable */
    #pragma omp parallel for
    for (i=0; i<nthreads; i++) {

        /* Obtain and print thread id */
        int tid = omp_get_thread_num();
        printf("i = %d, Hello World from thread = %d\n", i, tid);

    }  /* All threads join master thread and terminate */
}
  • omp_set_num_threads(4);

    • tell the compiler that I want to use 4 threads

  • #pragma omp parallel for

    • tell the compiler that I want to parallel myforloop

$ ./for
i = 0, Hello World from thread = 0
i = 2, Hello World from thread = 2
i = 3, Hello World from thread = 3
i = 1, Hello World from thread = 1

It's clear that OpenMP will assign one round of for-loop to each thread in the above example(4 threads and 4 loop iterations). Now you may ask: What if the total number offor loop iterations is bigger thannthreads?

#include <omp.h>
#include <stdio.h>
int main(int argc, char *argv[]) {

    int nthreads=4;
    omp_set_num_threads(nthreads);
    int i;
    /* Fork a team of threads with each thread having a private tid variable */
    #pragma omp parallel for
    for (i=0; i<8; i++) {

        /* Obtain and print thread id */
        int tid = omp_get_thread_num();
        printf("i = %d, Hello World from thread = %d\n", i, tid);

        /* do something time consuming*/
        int j=0;
        int a=0;
        for(j=0;j<1000000000; j++) {
                a++;
        }

    }  /* All threads join master thread and terminate */
}
  • each time there are 4 threads running at the same time.

  • the tid thread deal with the X-loop, where X range fromtid*(totalcircle/nthreads)to(tid+1)(totalcircle/nthreads)-1

  • in each loop, there is anotherforloop, you can see this for-loop as whatever may time consuming. I just want to show the multi-threads performance more obvious.

$ ./for2
i = 0, Hello World from thread = 0
i = 4, Hello World from thread = 2
i = 2, Hello World from thread = 1
i = 6, Hello World from thread = 3
i = 1, Hello World from thread = 0
i = 7, Hello World from thread = 3
i = 3, Hello World from thread = 1
i = 5, Hello World from thread = 2

And what if the total number of iterations inforcan't be divided by the number of threads? Please try by yourself with the following code!

#include <omp.h>
#include <stdio.h>
int main(int argc, char *argv[]) {

    int nthreads=4;
    omp_set_num_threads(nthreads);
    int i;
    /* Fork a team of threads with each thread having a private tid variable */
    #pragma omp parallel for
    for (i=0; i<9; i++) {

        /* Obtain and print thread id */
        int tid = omp_get_thread_num();
        printf("i = %d, Hello World from thread = %d\n", i, tid);

        /* do something time consuming*/
        int j=0;
        int a=0;
        for(j=0;j<1000000000; j++) {
            a++;
        }

    }  /* All threads join master thread and terminate */
}

In fact, you can also define how to assign workload to threads by yourself. Here is a just simple example, you can have many ways to assign your own workload to each thread!

#include <omp.h>
#include <stdio.h>

int main() {
    int num_threads=4;
    int num_for_rounds=10;
    int step = num_for_rounds / num_threads + 1;
    int i;
    #pragma omp parallel for
    for(i=0; i<num_threads; i++) {
        int tid = omp_get_thread_num();
        if(tid == num_threads - 1) {
            int j;
            for(j=tid*step; j<num_for_rounds; j++) {
                printf("tid = %d, idx = %d\n", tid, j);
            }
        } else {
            int j;
            for(j=tid*step; j<(tid+1)*step; j++) {
                printf("tid = %d, idx = %d\n", tid, j);
            }
        }
    }

    return 0;
}

Last updated