Webinar Q&As from OpenMP* 5.0: A Story about Threads and Tasks Webinar

Your Hosts:
Michael Klemm, Senior Application Engineer (Intel) and CEO of OpenMP ARB
James Cownie, Senior Principal Engineer (Intel)

Below you’ll find the answers to questions asked during the webinar. We hope you find them helpful.

Q: What is the relationship between Intel® MKL and OpenMP?
A: Intel MKL can use OpenMP for parallelism (just as many codes can). It can also use Threading Building Blocks (TBB), so you do not have to use OpenMP to use Intel MKL.

Q: Will the compiler tell me which strategy it chooses to use when encountering a “loop” pragma?
A: Based on our experience, that is up to the compiler vendor, and may be affected by what options you give. (In the same way that you ask for a vectorization report to see what the compiler did with your code.)

Language standards do not say how compilers have to behave with respect to command line flags or output … so tell your compiler vendor what you want to see!

Q: Why do we need meta-directive if we can use a loop construct?
A: If you don’t trust the compiler to do the right thing, you can force it to use a specific one by using the meta-directive to choose a particular, target-dependent implementation.

This also applies if you have spent time writing an assembler version (or something like that). By using the meta-directive, you can force the compiler to use your version.

Q: Can OpenMP be used for real-time systems such as n-tuple redundant digital fly-by-wire flight control systems for aircraft?
A: The OpenMP ARB has a working group that is looking into how OpenMP could potentially support real-time requirements and functional safety.

Stay tuned!

Q: What is the difference between OpenMP in C and in Fortran?
A: Apart from the obvious differences in the syntax and semantics of the base language, OpenMP tries to keep similar syntax and semantics across C, C++, and Fortran.

There are some language-specific features such as the WORKSHARE construct for Fortran.

Q: Is there any plan to implement task detatch in icc? If “yes”, when?
A: Yes, with the rest of OpenMP 5.0; keep your eye on the LLVM runtime (which is the same as the intel one).

Q: If I find a bug in Intel® Advisor but I have only a student license, how can I report this bug to Intel?
A: Well, yours isn’t an OpenMP question … but here’s the answer:

Use the Intel® Advisor forum. Intel Advisor experts there should be able to help you.

Q: We saw several examples for trees and graphs. This implies one must write one’s own tree-based algorithm. How about using C++ std::map instead?
A: This is a question for Parallel STL (PSTL) more than it is for OpenMP.

Parallel STL and its parallel algorithm can be implemented on top of OpenMP. This would naturally blend the parallelism of the PSTL with the parallelism OpenMP user code.

Here’s a great reference article: Get Started with Parallel STL.

Q: How does OpenMP work with the Intel® Fortran CoArray construct?
A: The OpenMP API version 5.0 does not yet support Co-Array Fortran, but the OpenMP ARB is looking into extending OpenMP into this direction.

Q: Between C and Fortran, is there support for things like array masks, packing, and complex lambda operations on arrays that should naturally be on the SIMD execution path?
A: The OpenMP ARB is discussing a proposal to support SIMDified lamba functions.

Q: Is there a mechanism to stream data between OpenMP tasks (e.g., via a FIFO)?
A: No. At this time there is no OpenMP feature that would directly support this.

You can use task dependences to model the FIFO queue that uses one task per pipeline stage. The pattern would look similar to this code fragment:

#pragma omp parallel
#pragma omp single
{
#pragma omp task (out:x)           // stage 1
    { x = stage_1_code(); }

#pragma omp task (in:x) out(out:y) // stage 2
    { y = stage_2_code(x); }

#pragma omp task (in:y) out(out:z) // stage 3

Q: What’s the timeline for Intel® Compiler to support OpenMP 5.0?
A: We will likely see OpenMP 5.0 support in the September 2019 release of the Intel Compiler.

Q: In which ifort version is OpenMP 5.0 supported? 2018 or 2019?
A: Same answer as above:

Q: Can C++11 range-based loops be used together with #pragma omp parallel?
A: Yes, in OpenMP 5.0 there is support for the != idiom for range-based iterator loops in C++.

Q: What abstractions are oriented to simplify the programmer’s work, and which ones allow expert users to get low-level specific optimizations?
A: You can consider constructs, like LOOP, tasking, etc. to be rather descriptive. Most constructs, however, can both. For instance:

PARALLEL FOR is descriptive because a programmer does not specify the number of threads or the loop schedule.

Programmers can be more prescriptive by adding NUM_THREADS or SCHEDULE clause to tell the OpenMP implementation more about the specific implementation choices to make.

Q: How can I get more detailed info from the compiler?
A: This depends on the compiler and its version. For the Intel compiler, the switch -QOPT-REPORT enables the optimization report that also contains information about OpenMP parallelization.

Q: Specific to the use of OpenMP libraries checked out of git repository with Intel’s current version of the compilers … Can I experiment with OpenMP 5 features without having to wait for ifort version 20 this way?
A: As most compiler vendors are still working on OpenMP API version 5 implementations, actual experiments will have to wait.

Please consult the compiler documentation of Intel® Parallel Studio XE for which features are supported in the respective compiler versions.

Q: It would be great if there are discussions on gcc/clang vs Intel compiler support of OpenMP 5.
A: See https://www.openmp.org/resources/openmp-compilers-tools/

Q: I’d like to see a gradual evolution from a descriptive to prescriptive program illustrated in a series of code examples, with results of performance measurements. It is often hard to judge the relative importance of various code transformations.
A: This is a very good suggestion that we can start implementing once the first full-fledged OpenMP 5 implementations become available.

Q: Which of the new OpenMP features relate to lockless programming in the implementation?
A: The most relevant new OpenMP feature in version 5.0 is the redefinition of the memory model that defines when data becomes visible to other threads.

Other OpenMP features relevant to that are the ATOMIC construct that was extended to cover more use cases of atomic CPU instructions. This feature has been extended in OpenMP version 4.0 and 4.5.

 

Additional Resources:

Follow Your Hosts