Goodbye, Slow Inference Workloads. Hello, Improved Quantization Techniques.

Deep learning deployment on the edge for real-time inference can significantly reduce the cost of communicating with the cloud in terms of network bandwidth, network latency, and power consumption.

But there’s a flip side: Edge devices have limited memory, compute, and power. As a result, using the traditional 32-bits of floating-point precision is often too computationally heavy for embedded deep learning inference workloads.

The Intel® Distribution of OpenVINO™ toolkit offers a solution via INT8 quantization—deep learning inference with 8-bit multipliers.

Join deep learning expert Alex Kozlov for a deeper dive into achieving better performance with less overhead on Intel® CPUs, GPUs, and VPUs using OpenVINO™ toolkit’s latest INT8 Calibration Tool and Runtime. He’ll cover:

  • New features such as asymmetric quantization, bias correction, and weight equalization to improve quality of inference workloads and lower precision
  • How to make best use of OpenVINO’s enhanced capabilities for your AI applications
  • Using INT8 to accelerate computation performance and save memory bandwidth and power, and provide better cache locality

Get the software
Download the latest version of Intel® Distribution of OpenVINO™ toolkit so you can follow along during the webinar.

More resources

Alexander Kozlov, Deep Learning R&D Engineer, Intel Corporation

Alexander is a Machine Learning/Deep Learning (ML/DL) Engineer at Intel with expertise in DL object detection architectures, Human Action Recognition approaches, and Neural Network compression techniques. Before Intel, he was a senior software engineer and researcher at Itseez (now acquired by Intel) where he worked on Computer VIsion algorithms for ADAS systems. Now Alexander focuses on deep learning neural network (DNN) compression methods and tools which allow getting more lightweight and hardware-friendly models. Alex holds a Master’s Degree from University of Nizhni Novgorod.

Performance varies by use, configuration, and other factors. Learn more at