Intel oneAPI News Updates
Intel Advances Architecture for Data Center, HPC-AI & Client Computing
“Today we unveiled our biggest shifts in Intel® architecture in a generation.”
That’s according to Raja Koduri, Sr. VP and GM of Intel Accelerated Computing Systems and Graphics Group, who unveiled the next generations of Intel architectures—CPU, GPU & IPU—at this year’s Intel Architecture Day. These new architectures will power upcoming high-performance products and establish the foundations for the next era of Intel innovation aimed at meeting the ever-growing demand for more computing power and performance.
Intel® oneAPI Toolkits enabling these advanced capabilities
See two demos running on Xe/GPU development platforms and optimized by Intel’s oneAPI tools.
Intel® oneAPI Rendering Toolkit demo [07:05] – This shows an end-to-end film-quality creator workflow using a pre-production implementation of two of this Kit’s libraries—Intel® Embree and Intel® Open Image Denoise—running cross-architecture on CPUs and Xe architecture ray tracing-accelerated GPUs. You’ll see:
- a CPU-based creator workflow using the commercial SideFx Houdini application with Pixar’s open-source USD APIs calling the Embree and Open Image Denoise API
- Embree and Open Image Denoise APIs call on the CPU to deliver beautiful rendering in real time via an Xe GPU
- a real-time walk-through of an Intel history-inspired path-traced scene at the fictitious 4004 Moore Lane
- the final 4K film-quality version of a 45-second film containing 1,350 frames rendered many times faster than with the CPU only
Intel® oneAPI AI Analytics Toolkit demo [1:27] – This highlights Ponte Vecchio with the AI Kit running ResNet50 benchmarks. Early results from ResNet50 inference and training throughput when using Ponte Vecchio and Sapphire Rapids have already established a new performance bar and show Intel is on track to deliver its goal of AI and HPC leadership that surpasses the previous performance leader.
The world is counting on architects, engineers, and developers to solve the most difficult computational problems, to enrich people’s lives. Intel’s hardware and software strategy and execution are accelerating to meet these demands … at a torrid pace.
Find Out More
Introducing Intel® Arc™ – A New High-performance, Discrete Graphics Brand
It’s arrived. Today Intel revealed the brand for its upcoming consumer high-performance graphics products: Intel® Arc™.
The Arc brand will cover hardware, software and services, and will span multiple hardware generations. The first is based on the Xe HPG microarchitecture (code-named Alchemist) and will feature hardware-based ray tracing and AI-driven super sampling, and offer full support for DirectX 12 Ultimate.
Find Out More
A Sampling of What the Industry is Saying …
“Intel wants to support the gamers driving this industry by offering a range of hardware choices backed by powerful, yet accessible, software tools.” – CG Magazine
“This is another step on Intel’s pagh to compete with the current graphics giants, NVIDIA and AMD.” – Geeknetic
“Finally, lovers of the Blue giant (Intel) will be able to build themselves a computer with both Core processors and Intel Arc graphics.” – Thanh Nien
Ray Tracing Path Forward: Visualization with an Open, Heterogenous, Fidelity-first Approach
Intel wraps up at SIGGRAPH this week with 10 technical sessions that highlight cool rendering features such as projection mapping, 3D modeling and animation, real-time graphics, workflow efficiency, memory solutions, filmmaking and more — with industry leaders including Adobe, Autodesk, Blender, Chapeau Studios, Dassault Systèmes, and Foundry (view session videos and PDFs). A keynote by Jim Jeffers, Sr. Principal Engineer and Sr. Director of Intel Advanced Rendering and Visualization Architecture, outlines how Intel’s comprehensive, end-to-end platform tackles the most complex workloads and delivers highest fidelity. The platform consists of powerful Intel® processors, Intel® Optane™ Persistent Memory, networking, and advanced ray tracing through open, heterogeneous development using Intel® oneAPI Rendering Toolkit. In special keynote ‘walk-on’ videos, Frederic Servant, Sr. software development manager of Autodesk shares how integrating Intel® Open Image Denoise (part of Intel oneAPI Rendering Toolkit) into Arnold renderer adds deep learning acceleration along with denoising speed and quality [view video 2:50]. The other ‘walk-on’ features Royal O’Brien, GM of Digital Media and Games at Linux Foundation, who discusses how the new open source project, Open 3D Engine (O3DE), opens up collaboration and innovation that benefits the entire industry [view video 3:30]. O3DE also uses components of Intel oneAPI Rendering Toolkit.
Looking forward, Jeffers expands on how sustainable development for ray tracing requires an open, flexible and a productive heterogeneous approach to provide choice as new innovative architectures become available, and to overcome the challenges presented by proprietary programming walled gardens. And with the extreme growth in data sets, model size and complexity, high fidelity is the key differentiator to get to photorealistic visualization as close to “ground truth.” To illustrate this point, Jeffers penned a blog that compares fidelity measurements showing publicly available models comparing Intel® Open Image Denoise vs Nvidia OptiX. Check it out, and envision how your work, — as content creators, developers, and researchers – can become more vivid, more productive, with oneAPI and Intel technologies.
The newest update of Intel® oneAPI Toolkits is now available for download and use in the Intel DevCloud, providing both improved performance and expanded capabilities for data-centric workloads including, AI, HPC, media, ray tracing, and more.
Intel’s oneAPI Toolkits provide compilers, libraries, analysis tools and optimized frameworks that implement industry standards—C++, SYCL, Fortran, MPI, OpenMP, Python and more—enabling unique CPU features including Intel® AVX-512, VNNI/Intel® DL Boost, and cryptography instructions along with GPUs and FPGAs to power complex data-driven applications.
- Improved performance on 3rd Gen Intel® Xeon® Scalable processors—code name Ice Lake Server—including compiler, library, and framework optimizations. (Note: these processors are now available in the Intel® DevCloud for oneAPI.)
- Inclusion of Intel® Extension for Scikit-learn* in the soon-to-be released Intel® oneAPI AI Analytics Toolkit update. This helps accelerate machine learning algorithms on Intel CPUs and GPUs across single- and multi-nodes with one-line dynamic patching and offers over a 100x performance boost.
- Improved graph optimizations without explicit input/output setting and better Int8 support with refined auto mixed-precision API for deep learning inference workloads. Will be included in the update of Intel® oneAPI AI Analytics Toolkit.
- Intel® DPC++ Compatibility Tool now supports CUDA 11.2 and 11.3 header fields, improved migration coverage for CUDA driver and memory-fence APIs, and reduced migration time by up to 50%.
- Intel® VTune™ Profiler better analyzes GPU offload schema with improved data-transfer analysis between the CPU and GPU. This capability boosts the hottest compute-kernel performance by automatically detecting the reasons limiting peak achievable GPU occupancy.
- Intel® Advisor introduces improved GPU Roofline Analysis. This allows you to gain insights into memory-bound codes to remove memory subsystem bottlenecks and get instance breakdowns of each kernel to compare the performance characteristics of different workloads.
- Intel® oneAPI Rendering Toolkit introduces additions to its volume ray tracing library, include new support for structured FP16, VDB FP16, VDB Motion Blur, VDB Cubic Filtering, and the capability to handle multiple volumes in the same scene.
- Intel® Compilers and select libraries—oneMKL, oneDAL, oneDNN, oneVPL, Intel® IPP, and Intel® MPI Library—are now available via Spack, a package manager for HPC.
- New C APIs, preview C++ and Python APIs, and samples for the Intel® oneAPI Video Processing Library (oneVPL) bring performance benefits to a wider range of use cases.
- Support of the Fortran 2008 standard is now available in Intel® Fortran Compiler (Beta) with OpenMP 5.1 subset support, including compute offload to GPUs.
- New Yocto Project layer meta-intel-oneapi allows easier integration of the DPC++ runtime, data collectors, libraries, and more into Yocto Project Linux kernels.
- Many more new features, performance, stability, and security improvements across all tools.
Read the blog to get all of the details.
Argonne National Laboratory in collaboration with Oak Ridge National Laboratory, has awarded Codeplay a contract implementing the oneAPI DPC++ compiler, an implementation of the SYCL open standard software, to support the AMD GPU-based high-performance compute supercomputer, Frontier.
Codeplay is a UK-based software company that develops compilers and tools for diverse hardware architectures and has been a leading implementor of SYCL compilers. Enabling the SYCL open standard on Frontier will provide a common programming model to develop scientific applications for heterogeneous compute environments across the U.S. Department of Energy’s national labs.
Affirming its commitment to oneAPI for heterogeneous computing, Lenovo recently announced that its latest release of LiCO includes the addition of three Intel oneAPI-based templates—Intel MPI, Intel MPITune, and Intel OpenMP—that are optimized to run on Intel® processors.
WhAT IS LiCO?
Short for Lenovo Intelligent Computing Orchestration, LiCO is a software solution that simplifies the use of clustered computing resources for AI model development and training. The unified platform simplifies interaction with the underlying compute resources, enabling customers to take advantage of popular open source cluster tools while reducing the effort and complexity of using it for AI.
Appsbroker Combines the Power of Intel and Google Cloud to Break Two World Records for HPC Speed in the STAC-A2 Benchmark
Google Premier Provider Appsbroker delivered a solution for testing, which involved a cluster of Google Cloud Platform (GCP) instances using Intel® Xeon® 2nd generation Scalable Processors, plus optimizations via Intel® oneAPI Tools. This is the first STAC-A2 benchmark to be tested on a public cloud and the results are record breaking.
“The STAC A2 C++ code was developed and optimized for Intel Architecture using the Intel® oneAPI Base Toolkit and the Intel® oneAPI Math Kernel Library (oneMKL). This high-performance code ran OnPrem which Appsbroker seamlessly transitioned to GCP.”–Mike Blalock, GM, Intel Financial Services Vertical
Today at SmartDev, Sber announced it would be enhancing the capacity of SberCloud ML Space, a cloud platform for full cycle development and implementation of AI services, offering tools and resources for the creation, training, and deployment of machine learning models, from quick connection to data sources to automatic deployment of models in SberCloud.
ML Space is a cloud service for organizing distributed learning using Intel® Xeon® Scalable Processors with built-in AI acceleration. Its architecture is based on SberCloud’s supercomputer, Christofari, which is the largest supercomputer in Russia and has a benchmark power of 6.7 petaFLOPS, achieved adopting the oneAPI programming model.
World-renowned golf equipment leader PING teams with Intel, Altair and Dell Technologies to boost product line innovation by applying high-performance computing (HPC) to its design strategy. This collaboration has helped PING slash design cycle time, decrease variability in product performance, improve quality without delaying time to market, and increase simulation speeds by 4.5x.
Read the story, including how PING leveraged Intel® oneAPI Math Kernel Library and Intel® MPI Library to improve application efficiency and quality on Intel® Xeon® Scalable processors.
And if you’ve got 3 minutes to spare (actually 2:43), check out this video showcasing how PING and Intel collaborated on HPC and AI to find the perfect golf club.
May 13, 2021 | oneAPI Industry Spec
“Today I’m happy to anounce the release of rev 1 of the oneAPI 1.1 provisional spec, which kickstarts the next phase of oneAPI’s evolution, says Sanjiv Shah, Intel VP and GM, Developer Software.
Read about the latest functionality that’s been added to enable new use cases in AI, Rendering, and Media, including:
- Advanced Ray Tracing. Software developers can now code for high-fidelity, ray-traced computations across multiple vendors’ systems and accelerators.
- oneAPI Deep Neural Network Library. oneDNN extends its scope by adding a graph API which compiles and executes a deep learning computation graph.
- oneAPI Video Processing Library. oneVPL adds Hyper encode mode that allows for more efficient split of workloads between available media components.
- Advances to the Language. Data Parallel C++ language extensions make up the majority of new features included in the latest SYCL specification.
The Intel® oneAPI Toolkits are updated for the first time since their inaugural production release in late 2020, offering additional performance and new capabilities for compute, AI, media, ray tracing, HPC, and more. These toolkits represent the most complete and diverse set of oneAPI-initiative compliant tools (compilers, libraries, pre-optimized frameworks, analyzers, debuggers) available today.
AI and HPC
- AI customers will see numerous enhancements for XGBoost and Scikit-learn performance, plus upgrades to Intel® Optimization for Tensorflow* and PyTorch*.
- The Intel® MPI Library expanded container support, and updated Mellanox OFED support.
- Intel® Advisor enhanced the Source View analysis for Offload Modeling and GPU Roofline capabilities.
- Intel® VTune™ Profiler added a Platform Diagram, a new starting point for the Input and Output analysis, revealing system topology and high-level utilization metrics for hardware resources including PCIe devices, Intel® Ultra Path Interconnect, and memory.
Media, Ray Tracing, and Rendering
- VDB multi-attribute volume support, faster structured and VDB volume sample, and quicker interval iteration on structure, VDB, and unstructured volumes.
- An enhanced denoising library now supports directional lightmaps and Apple Silicon.
- Library GUI enhancements include an improved Lights editor for easy scene lighting; gITF loader improvements enabling animation and skinning of triangle meshes, lights, materials and textures; and better scene file load+save.
- Media support for fast transcoding, streaming, and more.
- unified shared memory
- parellel reductions
- work groups and subgroup primitives
- and more
These features allow more control over the hardware, enable more flexible usage patterns, and can reduce verbosity for programmers. Developers can now write code using SYCL 2020 features with DPC++, but also seamlessly transition to using hipSYCL when, for example, AMD GPUs need to be targeted. Learn more.
February 17, 2021 | oneAPI Specification
Read Jim’s latest blog where he discusses this opportunity for developer feedback to help foster robust and efficient development in the area of ray tracing compute. By introducing ray tracing capabilities to the oneAPI specification, software developers across the industry will have the ability to “write once” for high-fidelity ray-traced computations across multiple vendor systems and accelerators.
Demetics Medical Technology Co. Ltd. is using Intel® Software Guard Extension (Intel® SGX) and Intel® oneAPI Math Kernel Library (oneMKL) to protect its medical artificial intelligence (AI) algorithms and intellectual property (IP) in medical devices at the edge. A pioneer in China of AI-based ultrasonography, Demetics accelerated adoption of DE-Light, its independently developed deep-learning framework that has shown outstanding performance and improved the accuracy of thyroid nodule detection under an open source framework by 30% to 40%.1.
Download oneMKL as part of the Intel® oneAPI Base Toolkit.
1 The test results are quoted from internal evaluation of Demetics. For more details, please contact Demetics.
The Khronos® Group, an open consortium of industry-leading companies creating advanced interoperability standards, announced the ratification and public release of the SYCL™ 2020 final specification—the open standard for single-source C++ parallel programming. A major milestone encompassing years of specification development, SYCL 2020 builds on the functionality of SYCL 1.2.1 to provide improved programmability, smaller code size, and increased performance. SYCL 2020 is based on C++17. It enables easier acceleration of standard C++ applications and drives a closer alignment with the ISO C++ roadmap. oneAPI Data Parallel C++ (DPC++) features are included in the SYCL 2020 final specification.
Since its launch in 2019, DPC++ has progressed significantly, building cross-architecture and cross-vendor support from the oneAPI Centers of Excellence, and now successfully upstreaming features to industry standards. Through open, community-based DPC++ development, Intel has made significant contributions in improved programming abstractions for SYCL. New capabilities accelerate heterogeneous parallel programming for HPC, machine learning, embedded computing, and compute-intensive applications across a range of XPU architectures such as CPUs, GPUs, FPGAs, and AI.
Intel® Embree Wins an Academy Award
The Academy of Motion Picture Arts and Sciences has awarded the open-source Intel® Embree Ray Tracing library (a component of the Intel® oneAPI Rendering Toolkit) an Academy Award in the form of a Scientific and Technical Achievement Award. The Academy, which hosts the annual Academy Awards®, recognizes Intel Embree’s industry-leading ray tracing for geometric rendering as a contributing innovation in the moviemaking process†.
A Sampling of Movies where Intel Embree was used
- How to Train Your Dragon: The Hidden World
- Lego Batman
- Next Gen
- Secret Life of Pets 2
- Avengers: Infinity War
- The Grinch
- and many more
†Award recipients: Sven Woop, Carsten Benthin, Attila Afra, Manfred Ernst, Ingo Wald, and Sr. Director of Intel Advanced Rendering & Visualization Jim Jeffers
Intel Releases oneAPI Toolkits for XPU Software Development
“Extending Intel’s software development tools from CPUs to GPUs and FPGAs is a key milestone in our XPU journey. As we promised, the oneAPI industry initiative delivers on bringing an open, unified cross-architecture programming to the ecosystem—providing an alternative to proprietary programming models. Our oneAPI toolkits, along with the Intel® DevCloud, provide the production tools needed to accelerate our advances into distributed intelligence era.” – Raja Koduri, Intel senior vice president, chief architect and general manager of Architecture, Graphics and Software
Intel released version 2021.1 of its oneAPI Toolkits to simplify development of high-performance applications across Intel® CPUs, GPUs, and FPGAs. The toolkits combine Intel’s rich heritage of proven developer tools with oneAPI, an open, standards-based, unified cross-architecture programming model, to enable developers to break free of the economic and technical burdens of proprietary programming models with the freedom to choose the best hardware for their specific workloads. The Intel® oneAPI Base Toolkit includes compilers, performance libraries, analysis and debug tools, and a compatibility tool that aids in migrating code written in CUDA to Data Parallel C++ (DPC++). Additional toolkits for HPC, AI, IoT, and rendering provide tools and components to accelerate specialized workloads. The toolkits are free to download and use locally, or access in the Intel® DevCloud where developers can develop and test workloads on a variety of Intel CPUs, GPUs and FPGAs. Access options include web download, repositories, and containers.
Other announcements include:
- Commercial versions providing worldwide support from Intel technical consulting engineers are also offered. Intel is immediately transitioning Intel® Parallel Studio XE and Intel® System Studio tool suites to its oneAPI products, which are upward-compatible and include all current capabilities plus new capabilities and tools.
- oneAPI Ecosystem Advancements: Lobachevsky University in Nizhni Novgorod (UNN) announced a new oneAPI center of excellence (CoE) to facilitate studies in contemporary physics using the power of CPUs, GPUs, and other accelerators with oneAPI cross-architecture programming.
- oneAPI Ecosystem Support: More than 60 leading research organizations, companies and universities support the industry-led oneAPI initiative and some note their success using Intel oneAPI Toolkits. See oneAPI ecosystem support and reviews site for details. A new oneAPI applications catalog details more than 240 applications powered by oneAPI.
Lobachevsky State University of Nizhni Novgorod to Accelerate Studies of Quantum Processes Using oneAPI
December 8, 2020 | oneAPI initiative
Lobachesky State University of Nizhni Novgorod (UNN) announced a new oneAPI Center of Excellence (CoE) to facilitate studies in contemporary physics using the power of CPUs, GPUs, and other accelerators with oneAPI cross-architecture programming. This new center aims to address research challenges requiring high-performance computing (HPC) on heterogeneous architectures, and to expand its software curriculum to train the next generation of scientists.
Gold Intel® oneAPI Toolkits Ship in December 2020
Intel oneAPI Toolkits give developers freedom to code across architectures, accelerate computing in the XPU era
- The gold shipment of Intel® oneAPI Toolkits in December.
- New capabilities in its software stack as part of Intel’s combined hardware and software design approach
- The debut of Intel’s first discrete GPU for the data center, Intel® Server GPU, based on the Xe-LP microarchitecture, and designed for high-density, low-latency Android cloud gaming and media streaming.
These milestones are important steps in delivering hardware and software solutions to address the growth in specialized workloads from AI to HPC and graphics. Read full press release for more.
Intel® oneAPI Toolkits: What’s New
Intel oneAPI toolkits gold release supporting Intel CPUs, GPUs, and FPGAs will ship in December for free available locally and in the Intel® DevCloud. The toolkits help developers deliver on the performance potential of the underlying hardware and lower software development and maintenance costs, while reducing risks associated with deploying accelerated computing relative to proprietary, vendor-specific solutions. They are built on the Intel’s rich heritage of CPU development tools now expanded to XPUs. Other announcements include:
- Commercial versions of Intel’s oneAPI toolkits will be available that include worldwide support from Intel technical consulting engineers. Prior tool suites Intel® Parallel Studio XE and Intel® System Studio are immediately transitioning into Intel’s oneAPI products.
- oneAPI Ecosystem Advancements – A new oneAPI Center of Excellence at the University of Illinois Beckman Institute for Advanced Science and Technology, alongside endorsements for oneAPI, including Microsoft Azure, TensorFlow, and many more.
- Expansion of the Intel® DevCloud to include new Intel® Xe GPU hardware: Intel® Iris® Xe MAX graphics for public access, and Intel® Xe-HP for select developers.
Learn more: Intel® oneAPI Products: Gold Release Coming in December blog | oneAPI Fact Sheet
University of Illinois to bring oneAPI cross-architecture programming model to NAMD
November 11, 2020 | oneAPI initiative
The Beckman Institute for Advanced Science and Technology at University of Illinois announced a new oneAPI Center of Excellence (CoE) to bring the oneAPI programming model to the life sciences application NAMD to additional heterogeneous computing environments. NAMD, which simulates large biomolecular systems, is helping to tackle real-world challenges such as COVID-19.
See You at SuperComputing 2020
November 6, 2020 | oneAPI @ SuperComputing 2020, oneAPI initiative, DPC++
With Supercomputing 2020 right around the corner, we’re looking forward to connecting with the community and sharing the progress of our collective efforts to simplify heterogeneous programming through a unified, standards-based model.
Topics span oneAPI, DPC++, SYCL, OpenMP and MPI, as well as performance tuning and visualization tools. From tech talks and in-depth workshops to demos and live chats, we look forward to the many ways we can interact over these next two weeks. Get the details.
Release beta10 of Intel® oneAPI Products Now Live
In preparation of the Gold Product release that’s coming soon, here are the details of release beta10.
- Find performance-degrading memory transfers with offload cost profiling for both DPC++ and OpenMP using Intel® VTune™ Profiler.
- Debug throttling issues and tune flops/watt using Intel® VTune™ Profiler to analyze power for CPU. GPU power analysis coming soon.
- Achieve 95% parallelization of Pandas APIs while getting 100% functional compatibility with Intel® Distribution of Modin. Its OmniSci backend adds Intel® Optane technology support with 6TB per node for efficient large-data scaling and the capability to seamlessly extend to cloud from local notebook without manual cluster spawning.
- Accelerate Python math code with the initial release of Data Parallel NumPy (dpnp), a native library and NumPy-like API accelerated with SYCL and Intel® GPU support.
- Speed up model-fitting and prediction on Intel® CPUs with added optimizations for scikit-learn algorithms, including Support Vector Classification (SVC), Random Forest, and KNeighbors classifiers.
- Convert trained models from XGBoost and LightGBM and accelerate model prediction on Intel CPUs using daal4py library.
- Mix inline ninja-level CPU assembly and GPU virtual-instruction-set code with the Intel® oneAPI DPC+/C++ Compiler.
- Perform high-fidelity, ray traced, interactive, and real-time rendering through the new Intel® OSPRay Studio, a scene-graph application with intuitive interface.
Additional Open Source Interfaces Available for oneAPI Math Kernel Library (oneMKL)
The array of open source interfaces for the oneAPI Math Kernel Library continues to expand, enabling developers to efficiently code portable, math-intensive applications that run across multiple vendors’ architectures. The interface for oneMKL’s random number generator (RNG) domain joins the interface for the library’s dense linear algebra BLAS domain released earlier this year. This new interface greatly expands the coverage of common math functions, providing routines that implement commonly used pseudorandom, quasi-random and non-deterministic engines with continuous and discrete distributions, which are used in Monte Carlo simulations, financial forecasting, risk management, cryptography and other applications.
These interfaces are available for download, and we encourage oneAPI partners to use them to support additional cross-architecture, cross-vendor hardware. Get the details.
oneAPI Center of Excellence Established at the Heidelberg University Computing Center (URZ)
Heidelberg University Computing Center announces it is establishing a oneAPI Center of Excellence (CoE) focused on adding advanced Data Parallel C++ (DPC++) capabilities to hipSYCL, which supports systems based on AMD GPUs, NVIDIA GPUs, and CPUs.
DPC++ is oneAPI’s cross-architecture language that is productive, readable, easy-to-learn, and helps developers extract accelerator performance without forcing them into a vendor-specific language or tool. New DPC++ extensions are part of the SYCL 2020 provisional specification that brings features such as unified shared memory to hipSYCL and the platforms it supports. This CoE provides a big step toward making this technical capability real for the software community and furthers the oneAPI industry initiative promise to create a cross-architecture, cross-vendor, programming solution that is both productive and performant.
oneAPI v1.0 Specification Released
September 28, 2020 | oneAPI initiative
We are excited to celebrate the release of the oneAPI v1.0 specification, making it easier to embrace accelerator programming and address data-intensive workloads. This release marks the culmination of twelve months’ collaboration among leading technologists from industry, academia and government, paving an open road for cross-architecture development.
The specification spans a language and many other domains that benefit from accelerators, including math libraries, deep learning and machine learning interfaces, video analytics APIs, and a low-level hardware abstraction interface or runtime API. An open source stack is now available that anyone can use and port to their favorite programs. Read the details from Sanjiv Shah, Developer Software Engineering Manager.
If you want to influence the next generation of accelerator software, enable a new language to target diverse accelerators, develop oneAPI-compliant tools, or plug in your hardware to take advantage of this software stack, we encourage you to contribute to the specification or the open source implementation on GitHub.
Release beta09 of Intel® oneAPI Products Now Live
In preparation of the Gold Product release coming later this year (stay tuned!), this release includes accelerator optimizations across a multitude of tools—compilers, libraries, and analysis tools— PLUS the introduction of Intel® Low Precision Optimization Tool supporting DL frameworks, Intel® Advisor’s Roofline analysis for GPUs, and the addition of H.264 and MJPEG software decode and encode in Intel® oneAPI Video Processing Library.
- Improved performance and stability across all compilers, libraries, and tools in preparation for the gold product release coming up later this year.
- Intel® AI Analytics Toolkit introduces the Intel® Low Precision Optimization Tool, a unified low-precision inference interface that supports multiple deep-learning frameworks. Easily convert FP32 models to int8 or Bfloat16; and take advantage of accuracy-driven tuning strategies; and optimizations for performance, model size, and memory footprint.
- Intel® Optimization for PyTorch adds bindings to Intel® oneAPI Collective Communications Library (oneCCL) for efficient distributed training on CPUs. It also supports automatic mixed precision for model data types.
- New Technical Preview capabilities in Intel® Advisor, including user interface workflows and toolbars that incorporate Roofline analysis for GPUs and Offload Advisor, as well as data transfer optimization recommendations in Offload Advisor.
- Find the module causing performance-killing I/O writes using Intel® VTune™ Profiler’s improved I/O analysis that identifies where slow MMIO writes are made
- For FPGA software performance, optimize it with less guessing using Intel VTune Profiler, which enables stall and data-transfer data for each compute unit in the FPGA.
- Intel® MPI Library introduces initial GPU programming support and enables the connecting of independent MPI jobs on Mellanox fabric.
- Intel® oneAPI Math Kernel Library (oneMKL) adds DPC++ functions for CPU and GPU, including RNG move operators and LAPACK buffer and pointer-based (Unified Shared Memory) interfaces.
- Intel® oneAPI Video Processing Library (oneVPL) adds H.264 & MJPEG software decode and encode; video pre-processing including resize, color conversion, and crop functions; and support for internally allocated buffers.
- Intel® Distribution for Python* adds initial Windows support for automatic GPU offload of data-parallel kernels inside Numba functions and multi-threading for scikit-image transform functions and filters optimizations.
Building Ray Tracing Solutions for the Future
At SIGGRAPH 2020, the world’s premier event for computer graphics, Intel announced new features to its Intel® oneAPI Rendering Toolkit and unveiled customer success stories brought to life through Intel’s powerful ray tracing platform including hardware, memory, networking, and software technologies. (See full press release.)
The oneAPI Rendering Toolkit’s new capabilities for ray tracing and rendering include:
- Intel® OSPRay Studio – A Scene Graph Application that demonstrates high-fidelity ray traced, interactive, real-time rendering; and provides capabilities to visualize multiple formats of 3D models and time series. This studio is used for robust scientific visualization and photorealistic rendering and was created through using Intel® OSPRay in conjunction with other Intel rendering libraries (Intel® Embree, Intel® Open Image Denoise, …etc.). Availability is later in 2020.
- Intel® OPSRay for Hydra – A Universal Scene Description (USD) Hydra API compliant renderer that provides high-fidelity, scalable ray tracing performance and real-time rendering with a viewport-focused interface for film animation and 3D CAD/CAM modeling.
- Intel® DevCloud for oneAPI desktop visualization capabilities – New visual development capabilities provide the ability to visualize and iterate rendering and create applications with real-time interactivity via remote desktop. Users can use the Intel oneAPI Rendering Toolkit to optimize visualization performance and evaluate workloads across a variety of the latest Intel hardware (CPU, GPU, FPGA). There is free access with no installation, setup or configuration required. Available now. Interested users can sign up here.
LAIKA Studios & Intel Join Forces to Expand What’s Possible in Stop-Motion Filmmaking
See how Laika Studios and Intel’s Applied Machine Learning team are working together to realize the limitless scope of stop-motion animation. Experts Jeff Stringer, Director of Production Technology, and Steve Emerson, Visual Effects Supervisor, discuss how the company is using Intel® oneAPI tools to incorporate the power of AI into its films.
- Watch! [2:38 mins]
- Read the Fast Company article [4-minute read]
- Learn more about Intel oneAPI tools, including trying your code in the Intel DevCloud
Release beta08 of Intel® oneAPI Products Now Live
Highlights include the introduction of Intel® Distribution of Modin and OmniSci for distributed (and accelerated) data analytics preprocessing, up to 4x improved rendering speed and particle volume support in the Intel® oneAPI Rendering Toolkit, introduction of Performance Snapshot profiling in Intel® VTune™ Profiler for quick initial analysis, memory-level roofline analysis in Intel® Advisor, H.265 and AV1 CPU software codecs in Intel® oneAPI Video Processing Library, and NUMA optimization capabilities in Intel® oneAPI Threading Building Blocks.
- Major Intel® oneAPI Video Processing Library (oneVPL) update, including H.265 & AV1 CPU software codecs and upward compatibility with Intel® Media SDK.
- Major Intel® oneAPI Threading Building Blocks (oneTBB) update including detailed NUMA affinity management capabilities and alignment with modern C++.
- Improved Intel® oneAPI DPC++ Compiler code performance for CPU architectures.
- Intel VTune Profiler continues to refine analysis for GPU accelerators with the addition of OpenMP offload pragma-aware metrics. It also adds a Performance Snapshot as a first profiling step to suggest the detailed analyses (memory, threading, etc.) that offers the most optimization opportunity.
- Intel® Advisor adds memory-level Roofline analysis that helps pinpoint exact memory hierarchy bottlenecks (L1, L2, L3 or DRAM).
- Initial OpenMP 5.0 GPU offload support in the Intel® C++ Compiler.
- Intel® AI Analytics Toolkit adds significant enhancements to data analytics workflows by introducing Intel® Distribution of Modin, released through the Anaconda channel. Seamlessly scale data preprocessing across multiple nodes using this intelligent, distributed dataframe library with an identical API to Pandas. In the backend, it is supported by OmniSci, a performant framework for end-to-end analytics that has been optimized to harness the computing power of existing and emerging Intel® hardware.
- Intel AI Analytics Toolkit also upgrades to PyTorch 1.5, which includes support for Bfloat16 data type and the latest 3rd Gen Intel® Xeon® Scalable Processors (codenamed Cooper Lake).
- The Intel® Distribution for Python introduces GPU support for Python/Numba code on Linux and the Python Data Parallel Processing Library (PyDPPL), a lightweight Python wrapper for DPC++ and SYCL that provides a data parallel interface and abstractions to efficiently tap into device management features of Intel® CPUs and GPUs.
- Intel® OSPRay and Intel® Open Volume Kernel both add support for particle volumes, while Intel OSPRay also adds support for Stereo 3D mode for panoramic camera and scalability of light sources.
- Performance improvements in Intel Open Volume Kernel and Intel® Open Image Denoise improved rendering speeds by up to 4x and improved image quality.
- Photon mapping support added to Intel® Embree.
- Intel Open Volume Kernel also adds support for configurable filter/reconstruction methods, stream-wide sampling and gradient API, Iterator allocation API, and strided data arrays.
- Intel Open Image Denoise improved image quality by adding additional Feature Buffers. It also includes new XTraining Code features and improvements.
- macOS CPU support introduced for the Intel® oneAPI Rendering Toolkit as well as C++ and Fortran compilers, most of the libraries in the Intel® oneAPI Base Toolkit, and analysis tool results viewers.
- New support for Singularity containers.
Meet Huddl.ai, the Future of Remote Collaboration
Slack meets Zoom meets Google Drive … in this new video collaboration platform, backed by prominent investors including former San Francisco 49ers player Ronnie Lott, and powered by Intel hardware and software technologies. Poised to transform remote collaboration, this solution automatically manages all meeting content via a real-time collaborative notes application, an automatic speech recognition function that turns speech into searchable text, and a recommendation engine that suggests meeting agendas based on participant.
“The future of remote collaboration is more than just audio-video. It’s about solutions that leverage deep learning and advanced media capabilities to help people be more productive at solving problems regardless of their location. Huddl.ai achieves high scalability and an innovative AI-based meeting experience through deployment of its cloud data platform on Intel® Xeon® processor-based servers, and integration of Intel’s Open WebRTC Toolkit for video conferencing and Intel® Distribution of OpenVINO™ toolkit, powered by oneAPI, for face and text detection.” — Jeff McVeigh, Vice President, Datacenter XPU Products and Solutions, Intel
“Huddl.ai is, simply put, the virtual meeting platform of the 21st century. While using Huddl.ai at Nutanix, we’ve experienced better meetings overall, but more specifically we’ve seen greater follow-up on things discussed in meetings and more efficient use of our scheduled meeting times.” — Wendy M. Pfeiffer, CIO, Nutanix
Collaboration between Bentley and Intel Leads the Future for Digital Craftsmanship
As the new Bentley Bentayga joins the portfolio of luxury cars on Bentley’s online car configurator, this configurator establishes a bar for the future of digital craftsmanship. It uses real-time, highly accurate visualization made possible through the advanced ray tracing capabilities of Intel® OSPRay, a component of the Intel® oneAPI Rendering Toolkit, to render over 1.7 million images, delivering seemingly limitless options to customers for the full Bentley model range.
The configurator now renders these images faster than ever before, with a 33% improvement in finding errors despite a 600% increase of content, powered by Intel® architecture, integration of artificial intelligence (AI) into OSPRay, and data and feedback from Bentley. The Bentley configurator aims to inspire customers in their auto purchase experiences.
Celebrating Women Innovators: Two Trailblazers Who Are Advancing Technology
Technology is designed for diverse individuals with unique needs—it only makes sense that it is best built by diverse communities, whether diversity in gender, race, culture, religion, sexual orientation, thought and more. So many women have advanced science, technology and other fields of innovation, yet when Forbes released its 2020 list of 100 Most Innovative People in Business, there was only one woman on that list. Here, we celebrate the stories of women who are using the DevCloud for oneAPI to further innovation. One project is funded by Spanish Government and is published by Springer, and the other in its infancy—and these are only a start. Both hold the promise and spirit of innovation that captivates and uplifts us to reach for more.
Intel Contributes Advanced oneAPI DPC++ Capabilities to the SYCL 2020 Provisional Spec
Today, The Khronos Group, an open consortium of industry-leading companies creating graphics and compute interoperability standards, announced its SYCL 2020 Provisional Specification, for which Intel has made significant contributions through new programming abstractions. These new capabilities accelerate heterogeneous parallel programming for HPC, machine learning, and compute-intensive applications.
“The SYCL 2020 Provisional Specification marks a significant milestone helping improve time-to-performance in programming heterogenous computing systems through more productive and familiar C++ programming constructs,” said Jeff McVeigh, Intel vice president of Datacenter XPU Products and Solutions. “Through active collaboration with The Khronos Group, the new specification includes significant features pioneered in oneAPI’s Data Parallel C++, such as unified shared memory, group algorithms and sub-groups that were up-streamed to SYCL 2020. Moving forward, Intel’s oneaAPI toolkits, which include the SYCL-based Intel® oneAPI DPC++ Compiler, will deliver productivity and performance for open, cross-architecture programming.”
See these references for more details.
- The Khronos Group: SYCL 2020 Provision Specification press release
- Intel PR Partner article: Intel Contributes Advanced oneAPI DPC++ Capabilities to the SYCL 2020 Provisional Spec
- InsideHPC: New, Open DPC++ Extensions Complement SYCL and C++, which notes how Intel is advancing industry standards
- oneAPI Code Together Podcast: Collaborating to Build a Heterogeneous Future – An interview with Ronan Keryell of Xilinx and Jeff Hammond at Intel that explains the value of open languages and programming models, diving into ISO C++, what excites them most about the SYCL 2020 Provisional Specification, and more [20 min]
TensorFlow on oneAPI Industry Innovation
With the growth of AI, machine learning, and data-centric applications, the industry needs a programming model that allows developers to take advantage of rapid innovation in processor architectures. TensorFlow supports the oneAPI industry initiative and its standards-based open specification. oneAPI complements TensorFlow’s modular design and provides increased choice of hardware vendor and processor architecture, and faster support of next-generation accelerators. TensorFlow uses oneAPI today on Xeon processors and we look forward to using oneAPI to run on future Intel architectures.
SeRC & Intel form first oneAPI Academic Center-of-Excellence
The Swedish e-Science Research Center (SeRC) announced that it has extended its support of the oneAPI initiative as Intel’s first oneAPI academic center of excellence (COE). Hosted at Stockholm University and the KTH Royal Institute of Technology, the center will use oneAPI’s unified and heterogeneous programming model to accelerate research conducted with GROMACS, a widely-used free and open-source application designed for molecular dynamics simulation.
Release Beta07 of Intel® oneAPI Products Now Live
Highlights include significant new capabilities in the Intel® AI Analytics Toolkit (Model Zoo, GPU support for DBSCAN and SVM algorithms, CPU optimizations for scikit-learn algorithms); improved DPC++ compiler performance, language definition and Intel DPC++ Compatibility Tool; enhancements to Intel® VTune™ Profiler and Intel® MPI Library; and new Intel® System Debugger capabilities.
- New Model Zoo in the Intel® AI Analytics Toolkit, including pretrained models and sample scripts for many popular open source deep learning topologies optimized for Intel architectures.
- Incorporates GPU support for DBSCAN and SVM algorithms in Intel® AI Analytics Toolkit, along with many CPU optimizations for scikit-learn algorithms. Includes scikit-ipp 1.0.0, a drop-in replacement for scikit-image package to accelerate image processing functions, as well as XGBoost 1.1 release with the latest histogram tree grow method optimized for Intel CPUs for faster training.
- Improved DPC++ compiler performance for CPU platforms.
- Simplified, modernized DPC++ language definition through use of newer standard C++ language features.
- Intel® VTune™ Profiler now supports the latest Intel GPUs—Gen9 and Gen11 integrated graphics, and pre-released Gen12 integrated and discrete GPUs—and incorporates an improved GPU Memory Hierarchy diagram annotated with GPU hardware performance metrics.
- Intel® MPI Library introduces initial GPU pinning support for Intel Xe architecture devices and expanded support for Mellanox ConnectX.
- Improved migration of CUDA math, texture, and parallelism library calls in the Intel DPC++ Compatibility Tool.
- Intel® System Debugger now provides a new auto-detection mechanism in the target connection assistant that helps quickly establish a system debug connection to a target platform. Enhanced system TraceCLI configuration support also allows developers to easily set-up this interface in both interactive and scripting modes, and a system debug sample enables developers to easily explore and learn to use the system debug capabilities.
Try oneAPI today. Beta07 is available now in the oneAPI DevCloud and via web download, containers, and repositories.
Software Innovators Contribute to the COVID-19 Response
The coronavirus (COVID-19) global pandemic has united our communities, even as we adhere to shelter-in-place stipulations. Never have science and technology been more important in helping us navigate these extraordinarily challenging times to emerge stronger. Here, we highlight a few of the Intel software innovators who are pursuing worthwhile medical research projects to contribute to this critical effort. Their projects span objectives—from speeding detection and uncovering new treatments to slowing the spread of the virus—and we are honored to support them in their endeavors. We hope all of our colleagues and communities around the world are keeping healthy and staying safe.
New Study Finds oneAPI Programming Model Saves Time and Money
A new research report from J.Gold Associates, “oneAPI: Software Abstraction for a Heterogeneous Computing World”, details the enterprise and developer benefits of transitioning to oneAPI.
Key Takeaway: Moving to a cross-architecture model for application development can save an organization significant time and money—over 5 months and $300,000 each time a performance-sensitive application is moved to a new computing platform. Read the report now.
Release beta06 of Intel® oneAPI products is now live.
Highlights include support for the Intel® Stratix® 10 FPGA family, extensive new data science capabilities (Intel® Scalable Dataframe Compiler for high-performance Pandas), major deep-learning framework improvements (bfloat16 datatype support in TensorFlow and the addition of torchvision to PyTorch for higher performance), and new rendering capabilities (support of VDB volumes, new geometries, new light sources, and the option to use pre-trained models and retrain filter models for denoising).
- Select Intel® oneAPI tools now support the Intel Stratix 10 FPGA family via the Intel® FPGA Programmable Acceleration Card D5005. (Note, this is in addition to current support of the Intel® Arria® 10 family.) Supported tools are:
- Intel® oneAPI DPC++ Compiler
- Intel® oneAPI DPC++ Library
- Intel® Advisor
- Intel® VTune™ Profiler
- New DPC++ CPU and GPU function support in Intel® oneAPI Math Kernel Library (oneMKL) for BLAS, LAPACK, RNG, and FFT functions.
- New DPC++ code samples added and others improved, including a new Mandelbrot visualization sample.
- Many improvements to the Intel® DPC++ Compatibility Tool, including improved CUDA code migration coverage for memory management and USM-enabled cuRAND API and DPC++ output code conciseness.
- New data science capabilities including:
- Intel Scalable Dataframe Compiler in the Intel® Distribution for Python for high-performance Pandas on CPUs
- uint8 support in XGBoost for reduced memory footprint
- optimized implementations of random forest, AdaBoost, and gradient boosting classifiers in Scikit-learn for high-performance ensemble learning
- Deep learning framework improvements including:
- New rendering capabilities of Intel® oneAPI Rendering Toolkit, including:
- Intel® Open Volume Kernel Library now supports VDB volumes and volume observers
- Intel® OSPRay now enables easy rendering of clipping geometries, plane geometries, and new light sources for creating natural sun light and photometric indoor lighting
- Intel® Embree now includes round, linear curves featuring a new curve primitive for rendering hair quickly
- Intel® Open Image Denoise adds the option to use pre-trained models and retrain filter models with user-defined datasets to improve image quality for specific renderers and content
- Intel® System Debugger now supports Python 3 to run modern debug scripts. It also provides a new intuitive system debug interface for Intel® Processor Trace. The Intel® Debug Extensions for WinDbg now support Windows Core OS and efficient ACPI Machine Language debug.
Try oneAPI today. Beta06 is available now in the oneAPI DevCloud and via web download, containers, and repositories.
New Podcast Series Explores the Cross-Architecture Journey
The emergence of machine learning, artificial intelligence, computer vision, and other compute-intensive workloads—and subsequent race to simplify cross-architecture development—has driven an exciting evolution in technologies across our software ecosystem.
A new podcast series, Code Together, will explore the challenges and possibilities of cross-architecture development through bi-weekly discussions with those at the forefront who are charting a course through this increasingly diverse, data-centric world. The series explores various aspects of this software ecosystem, from languages, compilers and libraries to other software development tools.
Read the blog to find out more.
Codeplay Brings NVIDIA GPU Support to Industry-Standard Math Library
April 20, 2020 | oneMKL
Codeplay has made another significant contribution to enabling an open standard, cross-architecture interface for developers as part of the oneAPI industry initiative. The latest contribution implements the commonly used cuBLAS library for NVIDIA GPUs, using the open standard SYCL and DPC++ implementation. This implementation forms part of the oneAPI Math Kernel Library (oneMLK) and is optimized to bring native performance to developers who use NVIDIA GPUs. Codeplay Developer Relations Manager Rod Burns shares the details.
Intel Open Sources the oneAPI Math Kernel Library (oneMKL) Interface
To address the lack of an industry-standard interface for math libraries and provide a single, cross-architecture API for CPUs and accelerators, Intel released the oneAPI Math Kernel Library (oneMKL) open source interface. The oneMKL specification lets developers efficiently code portable, math-intensive applications that run across multiple vendors’ architectures. The oneMKL APIs can be combined with math libraries that target a range of CPU hardware and other hardware accelerator architectures, providing a path to support for NVIDIA and AMD libraries in addition to Intel CPUs, GPUs and other accelerators. Get the details.
oneAPI v0.7 Specification Released
The oneAPI specification v0.7 has been released, which includes several enhancements to DPC++ including 10 new language extensions and updates to many of its libraries. Read the details from Sanjiv Shah, Developer Software Engineering Manager.
Developers Innovate with oneAPI
Several innovative developer projects are using the oneAPI cross-architecture programming framework, along with Intel® oneAPI Toolkits(Beta) – from scalable molecular dynamics, predicting corn/wheat/soybean yields, denoising graphics and high-fidelity rendering, and more. See them on Intel® DevMesh.
Release 05 of Intel® oneAPI beta products is now live. Here are the highlights:
- Extensive compiler improvements for mixed-language development, including DPC++/OpenMP* composability, additional OpenMP 5.0 and Fortran constructs, and increased runtime performance
- New support for Microsoft Visual Studio Code* (VS Code) includes code samples browser and profiling tool plugins to speed code development
- New and enhanced functions for Intel® oneAPI libraries, including matrix multiplication, machine learning, and codecs for CPU and GPU platforms
- Additional GPU performance metrics and easier workflow for FPGA performance analysis in Intel® VTune™ Profiler
- New and enhanced code samples, including 2D finite difference wave equation solution, Mandelbrot, and matrix multiplication
- New CentOS* container distribution
We encourage you to try oneAPI today. Beta05 is available now in the oneAPI DevCloud and via web download, containers, and repositories.
Intel® DevCloud Now Supporting JupyterLab*
Intel® DevCloud for oneAPI now supports the web-based JupyterLab* development environment to deliver a “modern” experience for Python*, AI, and other developers. They can use this platform to put instructions into code and do data analysis, data visualization, and interactive exploratory computing. Read more.
oneAPI Specification: Intel Compute Runtime Adds oneAPI Level Zero Support
March 9, 2020 | oneAPI Specification
“Meet Level Zero API and NEO driver running on Intel Gen9, Gen11, and Gen12 hardware first on Linux*, but it will not stop there” — Gregory Stoner, GPU Computing Solutions, Intel Corporation
The open source implementation of Level Zero, the low-level API specification for oneAPI, was just released, making it easier for accelerator vendors to leverage oneAPI for their devices. Looking forward to community feedback here.
The Intel® oneAPI 2021.1 Beta04 release is now available. Updates include:
- Enhancements to improve developer productivity for Intel CPU-GPU systems including improved DPC++ language conciseness for easier code comprehension and maintainability, broader support for Unified Shared Memory programming, additional GPU function support in oneAPI libraries, and improved analysis capabilities in Intel® Advisor and Intel® VTune Profiler.
- Improved performance, functionality, and stability across the Intel® oneAPI toolkits.
- Intel® oneAPI Rendering Toolkit introduces Intel® Open Volume Kernel Library for greatly enhanced volume sampling and rendering features.
We encourage you to try oneAPI. To get started, go to https://software.intel.com/en-us/oneapi. Beta04 is available in the oneAPI DevCloud and via web download, containers, and repositories.
Codeplay Brings SYCL*, Data Parallel C++ to Nvidia GPUs
“I promised at SC19 that we would open source SYCL fr Nvidia GPUs using Intel’s DPC++ SYCL compiler … and here it is. It’s a work-in-progress but being actively developed.” — Andrew Richards, CEO, Codeplay Software
Codeplay Software has announced the first release of its DPC++ compiler for Nvidia GPUs. This announcement marks a major milestone on the road to a single, cross-architecture, cross-vendor accelerator programming model. Read more.