Accelerating Linear Models for Machine Learning
Linear Regression Has Never Been Faster
If you’ve ever used Python* and scikit-learn to build machine learning (ML) models from large data sets, you may have also wished that you could make these computations go faster. What if I told you that altering a single line of code could accelerate your ML computations? What if I also told you that getting faster results doesn’t require specialized hardware?
In this article, I’ll teach you how to train ridge regression models using a version of scikit-learn that’s optimized for Intel CPUs, then compare the performance and accuracy of these models trained with the vanilla scikit-learn library. This article continues our series on accelerated ML algorithms.
- Fast Gradient Boosting Tree Inference for Intel® Xeon® Processors
- K-means Acceleration with 2nd Generation Intel® Xeon® Scalable Processors
A Practical Example of Linear Regression
Linear regression, a special case of ridge regression, has a lot of real-world applications. For my comparisons, I’m going to use the well-known House Sales in King County, USA data set from Kaggle. This data set is used to predict house prices based on one year of King County sales data.
This data set has 21,613 rows and 21 columns. Each row represents a house that was sold in King County between May 2014 and May 2015. The first column contains a unique identifier for the sale, the second column contains the date the house was sold, and the third column contains the sale price, which is also the target variable. Columns 4 to 21 contain various numerical characteristics of the house, such as the number of bedrooms, square footage, the year when the house was built, ZIP code, etc. I am going to build a ridge regression model that predicts the price of the house based on the data in columns 4 to 21.
In theory, the coefficients of the linear regression model should have the lowest residual sum of squares (RSS). In practice, the model with the lowest RSS is not always the best. Linear regression can produce inaccurate models if input data suffers from multicollinearity. Ridge regression can give more reliable estimates in this case.
Solving a Regression Problem with scikit-learn
Let’s see how to build a model with sklearn.linear_model.Ridge. The program below trains a ridge regression model on 80% of the rows from the House Sales dataset, then uses the other 20% to test the model’s accuracy.
The resulting R2 equals 0.69, meaning that our model describes 69% of variance in the data. See the Appendix for details on the quality of the trained models.
Even though ridge regression is quite fast in terms of training and prediction time, you usually need to perform multiple training experiments to tune the hyperparameters. You might also want to experiment with feature selection (i.e., evaluate the model on various subsets of features) to get better prediction accuracy. Since one round of training can take several minutes on large data sets, which can quickly add up if your task requires multiple rounds of training, the performance of linear model training is critical.
The Intel-optimized scikit-learn is made available through Intel® oneAPI AI Analytics Toolkit that provides optimized Python libraries and frameworks to accelerate end-to-end data science and machine learning pipelines on Intel® architectures. The Intel® Distribution for Python component of the toolkit includes scikit-learn and has the same set of algorithms and API’s as the Continuum or Stock scikit-learn, so no code changes are required to get a performance boost for ML algorithms.
The toolkit also provides daal4py, a Python interface to the Intel® oneAPI Data Analytics Library, through which it’s possible to improve the performance of scikit-learn even further. daal4py provides configurable ML kernels, some of which support streaming input data and can easily be scaled out to clusters of workstations.
The Intel® oneAPI AI Analytics Toolkit is distributed through many common channels, including from Intel’s website, yum, apt, conda, and more. Select and download the distribution package that’s best for you and follow the provided installation instructions.
How to Configure scikit-learn with daal4py
Intel-optimized scikit-learn will be ready for use once you finish the Toolkit installation and run the post-installation script.
Dynamic patching of scikit-learn is required to use daal4py as the underlying solver. You can enable patching without modifying your application by loading the daal4py module before your application:
python -m daal4py my_app.py
Alternatively, you can patch sklearn within your application:
import daal4py.sklearn daal4py.sklearn.patch_sklearn()
To undo the patch, run:
Applying the patch impacts the following scikit-learn algorithms:
- sklearn.linear_model.Ridge (solver=’auto’)
- sklearn.linear_model.LogisticRegression and
sklearn.linear_model.LogisticRegressionCV (solver in [‘lbfgs’, ‘newton-cg’])
- sklearn.decomposition.PCA (svd_solver=’full’ and introduces svd_solver=’daal’)
- sklearn.cluster.KMeans (algo=’full’)
- sklearn.metric.pairwise_distance with metric=’cosine’ or metric=’correlation’
This list may grow in future releases of the Intel oneAPI AI Analytics Toolkit.
To compare performance of the vanilla scikit-learn and Intel-optimized scikit-learn, we used the King County data set plus six artificially generated datasets with varying numbers of samples and features. The latter were generated using the scikit-learn make_regression function:
Figure 1 shows the wall-clock time spent on training a ridge regression model with two different configurations:
- Scikit-learn version 0.22 installed from the default set of conda channels.
- Scikit-learn version 0.21.3 from Intel Distribution for Python optimized with daal4py.
To enable Intel optimizations in scikit-learn, we used the -m daal4py command-line option. For performance measurements, we used the Amazon Web Services Elastic Compute Cloud (AWS EC2). We chose the instance that gives best performance:
- CPU: c5.metal (2nd Generation Intel® Xeon® Scalable processors, two sockets, 24 cores per socket)
Amazon states that “C5 instances offer the lowest price per vCPU in the Amazon EC2 family and are ideal for running advanced compute-intensive workloads.” The c5.metal instance has the most CPU cores and the latest CPUs among all C5 instances.
See the Configuration section below for hardware details. See the Appendix for details on the quality of trained models.
Figure 1 shows that:
- Ridge regression training is up to 5.49x faster with the Intel-optimized scikit-learn than with vanilla scikit-learn.
- The performance improvement from the Intel-optimized scikit-learn increases with the size of the data set.
What Makes scikit-learn in Intel oneAPI AI Analytics Toolkit Faster?
For big data sets, ridge regression spends most of its compute time on matrix multiplication. Intel optimizations of ridge regression relies on the Intel® Math Kernel Library (Intel® MKL), which is highly optimized for Intel CPUs. Intel MKL uses Single Instruction Multiple Data (SIMD) vector instructions from the Intel® Advanced Vector Extensions 512 (Intel® AVX-512) available on 2nd Generation Intel Xeon Scalable processors. Compute-intensive kernels like matrix multiplication benefit significantly from the data parallelism that these instructions provide. Another level of parallelism for matrix multiplication is achieved by splitting matrices into blocks and processing them in parallel using Threading Building Blocks (TBB).
The Intel-optimized version of scikit-learn gives significantly better performance for ridge regression with no loss of model accuracy and little to no code modification. The performance advantages are not limited to just this algorithm. As mentioned previously, the list of optimized ML algorithms continues to grow.
c5.metal AWS EC2 instance: Intel Xeon 8275CL processor, two sockets with 24 cores per socket, 192 GB RAM. OS: Ubuntu 18.04.3 LTS.
Vanilla scikit-learn: Python 3.8.0, scikit-learn 0.22, pandas 0.25.3.
Intel oneAPI AI Analytics Toolkit scikit-learn: Python 3.7.4, scikit-learn 0.21.3 optimized with daal4py 2020.0 build py37ha68da19_8, pandas 0.25.1.
Python and accompanying libraries are the default versions installed by the conda package manager when configuring the respective environments.
Table 1. Root mean squared deviation (RMSD) and the coefficient of determination (R2) for ridge regression models