Parallel Programming with OpenMP

Is Everyone There?

In some cases, it is necessary to synchronize all the threads.The #pragma omp barrier statement sets up a virtual hurdle: All the threads wait until the last one reaches the barrier before processing can continue. But think carefully before you introduce an artificial barrier – causing threads to suspend processing is going to affect the performance boost that parallelizing the program gave you. Threads that are waiting do not do any work. Listing 4 shows an example in which a barrier is unavoidable.

Listing 4

Unavoidable Barrier

01 #pragma omp parallel shared (A, B, C)
02 {
03   Calculationfunction(A,B);
04   printf("B was calculated from A\n");
05 #pragma omp barrier
06   Calculationfunction(B,C);
07   printf("C was calculated from B\n");
08 }

The Calculationfunction() line in this listing calculates the second argument with reference to the first one. The arguments in this case could be arrays, and the calculation function could be a complex mathematical matrix operation. Here, it is essential to use #pragma omp barrier – the failure to do so would mean some threads would start with the second round of calculations before the values for the calculation in B become available.

Some OpenMP constructs (such as parallel, for, single) include an implicit barrier that you can explicitly disable by adding a nowait clause, as in #pragma omp for nowait. Other synchronize mechanisms include:

  • # pragma omp master {Code}: Code that is only executed once and only by the master thread.
  • # pragma omp single {Code}: Code that is only executed once, but not necessarily by the master thread
  • # pragma omp flush (Variables): Cached variables written back to main memory ensures a consistent view of the memory.

These synchronization mechanisms will help keep your code running smoothly in multi-processor environments.

Library Functions

OpenMP has a couple of additional functions, which are listed in Table 2. If you want to use them, you need to include the omp.h header file in C/C++. To make sure the program will build without OpenMP, it would make sense to add the #ifdef _OPENMP line for conditional compilation.

#ifdef _OPENMP
#include <omp.h>
threads = omp_get_num_threads();
threads = 1

Locking functions allow a thread to lock a resource, by reserving exclusive access

(omp_set_lock()) to it. Other threads can then use a omp_test_lock() query to find out whether the resource is locked. This setup is useful if you want multiple threads to write data to a file, but want to restrict access to one thread at a time. When you use locking functions, be careful to avoid deadlocks.

A deadlock can occur if threads need resources but lock each other out. For example, if thread 1 successfully locks up resource A and is now waiting to use resource B, while thread 2 does exactly the opposite. Both threads wait forever.

Environmental Variables

Some environmental variables control the run-time behavior of OpenMP programs; the most important is OMP_NUM_THREADS. It specifies how many threads can operate in a parallel regions, because too many threads will actually slow down processing. The export OMP_NUM_THREADS=1 tells a program to run with just one thread in bash – just like a normal serial program.

Read full article as PDF:

064-069_openMP.pdf  (899.60 kB)

Related content

  • Intel Compiler 9.0

    Intel presented Version 9.0 of the C++ compiler for Intel processors in June, raising the bar for highly optimized code.

  • GCC 4.2

    The latest GNU compiler provides better support for parallel programming, and GCC also rolls out some new optimization features. We took GCC 4.2 for a test drive.

  • New C++ Features in GCC

    Recent versions of the GNU compiler include new features from the next C++ standard.

  • Intel Updates C++ and Fortran Compilers for Linux

    Chipmaker Intel has reworked its proprietary Linux compilers. The Intel C/C++ compiler version 11.0 now supports the mobile processor Atom. The same version of the Fortran compiler now supports the Fortran 2003 language standard.

  • Xeon Phi

    The Xeon Phi accelerator card from Intel takes an unusual approach: Instead of GPUs, the Xeon Phi features a cluster of CPUs for easier programming.


  • Please reference wiki-diagrams

    I know this is old, but you have used a picture of mine.

    The fork-join diagram was released under a CC-BY-SA licence. Please attribute to the wiki-page.
comments powered by Disqus

Direct Download

Read full article as PDF:

064-069_openMP.pdf  (899.60 kB)