Improve Performance on Distributed Deep Learning Training Workloads

The exponential growth in use of large, deep neural networks (DNNs) has accelerated the need for training these networks in hours. Even minutes.

This kind of speed cannot be achieved on a single machine—a single node cannot satisfy the compute, memory, and I/O requirements of today’s state-of-the-art DNNs.

The way to do it is through scalable and efficient distributed training, which is facilitated by deep learning (DL) frameworks.

Join Intel Software Engineer and deep learning expert Mikhail Smorkalov for an overview of three Intel-optimized DL frameworks—Caffe*, Horovod* (for TensorFlow*), and nGraph—that boost communication performance on distributed workloads compared to existing approaches.

Additional Resources
Find out more about these optimized frameworks, including how to get them.

Mikhail Smorkalov, Software Engineer, Intel Corporation

Mikhail is a Senior Software Engineer specializing in deep learning (DL) technologies. His responsibilities include defining DL architecture, developing and deploying new features for the Intel® Machine Learning & Scaling Library (Intel® MLSL) and, last but not least, scaling DL workloads to some of the fastest supercomputers in the world.

Before joining Intel in 2014, Mikhail spent years developing software and middleware for the telecom industry. He holds a Master’s of Science in Computational Mathematics and Cybernetics from State University of Nizhni Novgorod.

Performance varies by use, configuration, and other factors. Learn more at