Deliver Blazing-fast Python Data Science and AI Performance on CPUs—with Minimal Code Changes

By 2024, IDC’s industry analysts believe the global machine learning (ML) market will grow to a total of USD 30.6 billion, attaining a CAGR of 43 percent.1 In that same year, analysts also predict that 143 zettabytes of data will be created, captured, copied, and consumed throughout the world.1

The hardware environments to manage this data across the end-to-end AI workflow are getting more diverse, with unique accelerators coming to market to manage specific use cases. A recent Evans Data Corporation report notes that 40 percent of developers target heterogeneous systems that use more than one type of processor, processor core, or coprocessor.2

For AI, ML, and deep learning (DL) developers, it’s an intriguing business opportunity that challenges them to build high-performance applications with flexible deployment options. One hardware architecture can no longer do it all.

To capture this opportunity, achieving peak performance with Python across a variety of architectures is critical. Python is a powerful, scalable, and easy-to-use language. But it’s not built from the ground up for blazing-fast performance. Many developers rely on vendor-specific optimizations for frameworks and libraries such as TensorFlow, PyTorch, or scikit-learn to achieve the speeds they need on their hardware of choice—but are left wondering how they can get similar results when new hardware emerges or the market evolves to require deployment on additional hardware types.

That’s where the Intel® oneAPI AI Analytics Toolkit (AI Kit) comes in.  

It’s a domain-specific toolkit that’s built on Intel oneAPI cross-architecture performance libraries to accelerate end-to-end data science and AI workflows in the Python ecosystem.   The toolkit maximizes performance from preprocessing to ML/DL training and inference on Intel hardware architectures—both those commonly used today and those that will become industry standards in the future. The goal is to minimize development costs by providing drop-in acceleration for familiar Python frameworks. Doing so, we empower data scientists and developers to create confidently without being constrained to proprietary program environments or adopting new software APIs every time they need to support a new hardware platform.’

Achieve drop-in acceleration with the Intel oneAPI AI Analytics Toolkit

With the AI Kit, you can continue to use your favorite framework or Python library as you transition to Intel architectures. You’ll get drop-in acceleration with minimal code changes—or even no changes at all.

To demonstrate how this toolkit can help you shatter benchmarks and accelerate workflows on CPUs with little to no added effort, we’ve compiled five benchmarks that span many of today’s most popular libraries.

 

Let’s look at examples from across the entire AI pipeline: data preprocessing, model training, and model inference.

 

#1: End-to-end performance on census data set.

The industry-standard census data set benchmark shows outstanding results when performed with the AI toolkit on CPU architectures. To create these results, we trained a model using 50 years of census data from IPUMS.org to predict income based on education.

Our toolkit enables the model to run much faster when compared to stock libraries. In the benchmark below, there is considerable acceleration across the data science pipeline, boosting ETL by 38x, as well as machine-learning predict and fit with ridge regression by 21x.

To make these results possible, the Intel® Distribution for Modin was employed for data ingestion and ETL alongside the Intel® Extensions for Scikit-learn for model training and prediction. Additional performance boosts included 6x improved CSV-read performance as well as 4x improved train test split.

 

#2: Machine learning performance with Intel® Extensions for Scikit-learn.

Intel® Extensions for Scikit-learn helps you achieve highly efficient data layout, blocking, multithreading, and vectorization. In this benchmark, there was a 220x performance boost for the scikit-learn algorithm compared to the stock version.

The Intel® Extensions for Scikit-learn algorithms also outperform the same algorithms run on the AMD EPYC 7742 processor. The Intel® Advanced Vector Extensions (Intel® AVX-512), unavailable on AMD processors, provide much of the performance improvement. We also see that the Intel® Extensions for Scikit-learn consistently outperformed the NVIDIA V100 GPU.

You can learn more about the performance boosts delivered by the Intel® Extensions of Scikit-learn library here.

 

#3: Exceeding NVIDIA GPU ML performance with Intel-optimized XGBoost.

Intel actively contributes optimizations to the XGBoost open source project. XGBoost is an open source library that provides a gradient-boosting framework for Python and other programming languages. Using the Intel-optimized XGBoost library for Python offered in our toolkit, you gain up to 4.5x faster inference on Intel® Xeon® processors compared to NVIDIA V100 GPUs. This chart shows how Intel CPUs provide superior performance for the complex gradient-boosting algorithms commonly used for classification and regression—and how Intel can help remove the cost and effort of offloading such workloads to specialized accelerators.

 

#4: INT8 quantized inference performance with minimal accuracy loss.

Recently, we’ve seen increasing use of low-precision quantization to boost performance for deep learning inference workloads. Unfortunately, the speed comes at the cost of reduced accuracy. With the newly introduced Intel® Low Precision Optimization Tool—provided as part of the AI Kit—we’re demonstrating inference throughput scaling up to 2.8x with negligible accuracy drop. It’s made possible by the automatic accuracy-driven tuning strategies introduced in the tool.

In the benchmark below, conversion from FP32 to int8 was done using the Intel Low Precision Optimization Tool, and inference was run using Intel Optimization for TensorFlow, both supporting Intel® DL Boost technology for faster performance.

 

#5: Deep learning training and inference performance using Intel® Optimization for PyTorch with 3rd Gen Intel® Xeon® Scalable processors.

The AI Kit also enhances PyTorch performance on CPU architectures with an Intel-optimized library.

Intel closely collaborates with Facebook to contribute optimizations to the PyTorch open source project. We also offer a PyTorch extension library to further boost performance for training and inference through automatic type and layout conversions.

Our optimizations leverage Intel DL Boost and BFloat16 technologies to improve training performance of the DLRM model by up to 1.55x. Meanwhile, Intel’s DL Boost and int8 optimizations improve the DLRM model’s inference performance by up to 2.85x when compared to FP32 performance. This benchmark clearly demonstrates superior performance for new age-recommender models as well as computer-vision workloads—enabled by our hardware advancements and software optimizations.

Get started with the Intel oneAPI AI Analytics Toolkit

To summarize, the Intel oneAPI AI Analytics Toolkit provides optimized Python libraries, deep learning frameworks, and a lightweight parallel dataframe—all built using oneAPI libraries—to maximize performance for end-to-end data science and AI workflows on Intel hardware. The toolkit allows data scientists and AI developers to get the latest Intel DL/ML optimizations from a single resource with seamless interoperability and out-of-the-box performance.

The toolkit is a free download that you can access today with just a few clicks. Unlock better cross-architecture Python performance for your AI applications today.

Download the toolkit to access the Intel-optimized frameworks and libraries discussed in this article.

 

Additional resources

 

References

  1. IDC Worldwide Global DataSphere Forecast, 2020–2024: The COVID-19 Data Bump and the Future of Data Growth.
  2. Evans Data Global Development Survey 2020, Volume 2.
Performance varies by use, configuration, and other factors. Learn more at www.Intel.com/PerformanceIndex.