Modernize your Code for Performance, Portability, and Scalability

What’s New in Intel® Parallel Studio XE

Whether you’re working on HPC clusters, remote clouds, local workstations, or anything in between, Intel® Parallel Studio XE is a workhorse—with compilers, libraries, and tools to help you improve your productivity and application performance. And Intel continues to innovate with the latest release of Intel Parallel Studio XE. Besides adding support for the newest hardware and programming language standards, several new features address growing technologies and new environments.


Tools in Intel Parallel Studio XE have support for the latest hardware including Intel® Xeon Phi™ processors and the Intel® Xeon® Scalable processor family. Taking advantage of the 60-plus cores (200-plus threads) on Intel Xeon Phi processors and the latest Intel® AVX-512 vectorization instructions on these platforms moves your application performance into the future―and this latest tool suite helps you get there.


The Intel® C/C++ and Fortran* compilers have updates for many of the latest standards without sacrificing performance. The Intel C/C++ compiler has full support for C11 and C++14, as well as initial C++ 17 support. The Fortran compiler has full Fortran 2008 and initial Fortran 2015 support. Also, to help take advantage of growing core counts and vector registers, this release adds Parallel Standard Template Library (Parallel STL) for parallel and vector execution of the C++ STL and initial OpenMP* 5.0 draft support. Using the Intel compiler is often as easy as switching a few variables in a Makefile* or integrating with your environment like Microsoft Visual Studio* or Xcode*. Try out a free evaluation to see if you can boost your application’s performance.

High-Performance Python*

The latest release of the Intel® Distribution for Python* improves the speed of many common libraries and algorithms, especially in data analytics (Figure 1). The new optimizations in the SciPy* library and NumPy* package provide speedups for scikit-learn*, one of the most popular machine learning packages. The Intel Distribution for Python also leverages the Intel® Data Analytics Acceleration Library (Intel® DAAL) through the pyDAAL interface. For some algorithms in scikit-learn, speedups as high as 140X have been achieved.

This release also includes an OpenCV* package accelerated with Intel® Integrated Performance Primitives. OpenCV is a popular library used for computer vision across many disciplines.

Figure 1 – Python performance as a percentage of C++/Intel DAAL on Intel Xeon processors (higher is better)


Performance Libraries

Intel Parallel Studio XE comes with several libraries designed to ease the burden of creating high-performance software. This latest release builds on that motivation with improved algorithms and usability across the board.

Intel® Math Kernel Library (Intel® MKL)

Intel® MKL implements some of the most common mathematical routines in highly optimized, hand-tuned versions to take advantage of the latest processor features. Several improvements have been made targeting batched and compact operations, allowing users to run large groups of linear algebra computations more efficiently. This is done using the Batch and Compact APIs. [Editor’s note: The Batch API is discussed in “Introducing Batch GEMM Operations” on Intel® Developer Zone. The Packed API was discussed in “Reducing Packing Overhead in Matrix-Matrix Multiplication”.] Intel MKL takes care of the grouping and parallelization behind the scenes. Also, this release adds 24 new vector math functions, providing a wider range of highly optimized routines to choose from.

Intel® Integrated Performance Primitives (Intel® IPP)

Intel® IPP provides performance-optimized, low-level building blocks for image, signal, and data processing (data compression/decompression and cryptography) applications. The latest version of Intel IPP has again boosted many performance-sensitive algorithms, including the addition of SSE4.2 and AVX2 vector instructions for LZO (Lempel–Ziv–Oberhumer) data compression. Also, the previous dependency the cryptography package had on the main library has been removed, making it easier to take advantage of the powerful cryptography algorithms included in Intel IPP.

Intel® Threading Building Blocks (Intel® TBB) and Parallel STL

Developers looking to parallelize their C++ applications have known about Intel® TBB for years. The open-source library has been improved upon again and again, and this year Intel TBB is being used in the Intel implementation of the Parallel STL.

Analysis Tools

Intel Parallel Studio XE Professional and Cluster editions include several tools to help analyze, tune, and debug applications in increasingly complex hardware and software ecosystems.

Intel® Advisor

Whether you’re looking to add performance through parallelism or vectorization, Intel® Advisor can help. Which functions and loops should I target for optimizations? What’s preventing the compiler from vectorizing my code? How much will my performance improve if I make the recommended changes? Intel Advisor is designed to answer questions like these. The 2018 version of Intel Advisor includes the cutting-edge Roofline Analysis feature to pinpoint memory bottlenecks—specifically, loops suffering from poor vectorization or memory locality (Figure 2).

Use Intel Advisor to easily determine where and how your code can benefit from advances in the latest technologies like AVX-512 and the highly-parallel Intel Xeon and Xeon Phi architectures.

Intel® VTune™ Amplifier―Now including Performance Snapshot

Intel® VTune™ Amplifier is a powerful analysis tool with built-in expert guidance to help you understand and boost your application’s performance. And now it includes Performance Snapshot (Figure 3), a quick and easy-to-use script that provides a high-level view of an application’s use of available hardware (CPU, FPU, and memory). Whether you’re parallelizing with MPI, OpenMP, or both, time spent inside these libraries or time spent waiting for parallel work to complete can quickly degrade performance.

Figure 2 – Intel® Advisor Roofline Analysis



Figure 3 – Intel® VTune™ Amplifier Performance Snapshot



Additionally, letting floating-point units sit idle or stalling CPUs while they wait for memory accesses can leave lot of performance on the table. Use Performance Snapshot to reclaim these lost cycles.

Intel VTune Amplifier is also adding features for users taking advantage of new technologies like containerization and cloud computing. It’s now possible to profile applications running in Docker* and Mesos* containers or attach an Intel VTune Amplifier performance analysis to a running Java* service or daemon.

Intel® Inspector

Optimizing applications for performance is as important as ever, but it’s all for naught if an application doesn’t run correctly. Intel Inspector automatically checks your application for threading and memory errors as it runs. These hard-to-diagnose issues may not be detected through standard tests that rely on incorrect results. However, the algorithms in Intel Inspector can detect issues that may cause a problem in the future, like memory leaks and non-deterministic race conditions. We’ve added more advanced locking models in this release and, because Intel Inspector doesn’t require any special recompilation, it just takes a few clicks to get a profile started and see what issues may be hiding in your code.

Cluster Tools

Cluster computing used to be confined to a limited audience that needed remote access to large, often strictly managed clusters. Now, with the ubiquity of cloud computing resources and processors like the Intel Xeon Phi processor―with enough cores to act like a single-node cluster―cluster computing is expanding its reach. Intel has always been in this field, and the tools in the Cluster Edition of Intel Parallel Studio XE reflect that experience.

The latest version of Intel® MPI Library includes significant optimizations to application startup and finalization, enhancing productivity and scalability.

Intel® Trace Analyzer and Collector now supports OpenSHMEM*, which has growing interest in the partitioned global address space (PGAS) community.

A recent addition to the suite of tools is Intel® Cluster Checker, which uses a built-in expert system to diagnose cluster issues and propose actionable remedies. [Editor’s note: Intel Cluster Checker is discussed in greater detail in Is Your Cluster Healthy?]

Intel Cluster Checker performs tests to identify issues with cluster functionality, performance, and uniformity, including on the latest Intel® hardware and software. The tool is useful for system administrators who frequently check cluster status. Now the API gives developers access to these diagnostics and can provide analyses specific to their needs.

Hardware and Software are Evolving

Creating applications to meet the demands of today’s ecosystem is hard enough. Add to that the requirement of seamlessly adapting to future platforms and environments and keeping up becomes a huge challenge. Adopting tools like those in Intel Parallel Studio XE makes it easier and more efficient to design scalable, high-performance software. Try it out for free today and prepare your code for the platforms of tomorrow.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit

Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

For more complete information about compiler optimizations, see our Optimization Notice.