Accelerate Your scikit-learn Applications

Faster Experimentation with Predictable Behavior

By Oleksandr Pavlyk (Intel Corporation) and Olivier Grisel (INRIA)

The Intel Distribution for Python (IDP), part of the Intel AI Analytics Toolkit, includes an optimized scikit-learn that accelerates a selection of common estimators (e.g., logistic regression, singular value decomposition, principal component analysis). These functions are built on top of the Intel Data Analytics Acceleration Library (DAAL) so they achieve performance close to that equivalent C++ programs. The DAAL-powered estimators are implemented in the daal4py package.

DAAL’s performance comes from efficient use of multiple CPU cores, cache-friendly blocking, and effective use of processor instruction sets. It is tuned to run best on Intel processors. Improved scikit-learn performance benefits users in shortened model development iteration cycles and reduced cost of training. Improved software engineering in the library also results in a smaller memory footprint, which allows users to tackle larger machine learning problems with their existing hardware.

Figures 1 and 2 show speedups of the accelerated scikit-learn over the base library, as measured with scikit-learn_bench. Figure 1 compares the multithreaded, accelerated scikit-learn against the best performance of the base scikit-learn between n_jobs=1 and n_jobs=-1. The scikit-learn user needs to know the algorithm details to use the n_jobs setting to improve performance for some functions (e.g., using a non-default value of n_jobs for LogisticRegression is detrimental to performance). Scikit-learn developers are working to improve the user experience.

Figure 1. Multithreaded speedup of the accelerated scikit-learn over the base scikit-learn.

Running the accelerated scikit-learn sequentially shows that many algorithms in the base scikit-learn have room for performance improvement, notably training and inference of SVC as well as training of LinearRegression (Figure 2).

Figure 2. Sequential speedup of the accelerated scikit-learn over the base scikit-learn.

The pursuit of performance can sometimes sacrifice correctness if the developer isn’t careful. To insure that the accelerated scikit-learn lives up to the high standards of the scikit-learn user community, the accelerated version is being required to pass the scikit-learn test suite. A system to run these tests was developed in collaboration with scikit-learn core developers, Olivier Grisel and Jérémie du Boisberranger.

Testing is done with the currently released scikit-learn and the current master sources. The status of these tests is displayed on the landing page of
github.com/IntelPython/daal4py:

The testing is performed in CircleCI so an interested user has easy access to the testing logs for further inspection.

Special attention is paid to insuring deterministic, mathematical equivalence between the accelerated scikit-learn and the base version. Mathematical equivalence means solutions obtained by both versions agree within the tolerance specification of the solver. Such a cross checking has resulted in a feedback to improve scikit-learn’s own test suite, e.g.: scikit-learn/#12738, #12263, and #13992.

This collaboration also given scikit-learn developers a better understanding of the performance of their implementation. For instance, @jeremiedbb completely refactored the k-means implementation using Cython to improve multithreaded scalability (scikit-learn#11950). This work is now part of the 0.23 release of scikit-learn.

To accelerate your own scikit-learn installation, you need to install daal4py. This is easy to do with the conda package manager:

conda install -c intel daal4py

pip users can install daal4py as follows:

# Required Intel(R) DAAL is available for free:
#  - binary form: https://software.intel.com/en-us/daal
#  - sources: https://github.com/intel/daal
#  Intel(R) TBB can be obtained from https://software.intel.com/en-us/tbb
export NO_DIST=1
 export DAALROOT=/path/to/daal_library
 pip install -e
 git+https://github.com/IntelPython/daal4py.git#egg=daal4py

Once daal4py is installed, you can accelerate your scikit-learn installation (version >=0.19) in either of two ways:

python -m daal4py your_application.py

which is great for running tests, and for quick experimentation. You can also do so explicitly in your script:

import daal4py.sklearn
daal4py.sklearn.patch_sklearn()

Patching is accompanied by informational message:

In [1]: import daal4py.sklearn

In [2]: daal4py.sklearn.patch_sklearn()
Intel(R) Data Analytics Acceleration Library (Intel(R) DAAL)
solvers for sklearn enabled:
https://intelpython.github.io/daal4py/sklearn.html

We invite you to try accelerating your scikit-learn workloads with daal4py and Intel AI Analytics Toolkit to see the performance improvements for yourself.