Programming of heterogeneous platforms requires deep understanding of system architecture on all levels, which help applications design to leveraging the best data and work decomposition between CPU and an accelerating hardware like GPUs. However, in many cases the applications are being converted form a conventional CPU programming language like C++, or from accelerator friendly but still low level languages like OpenCL, and the main problem is to determine which part of the application is leveraging from being offloaded to GPU. Another problem is to estimate, how much performance increase one might gain due to the accelerating in the particular GP GPU device. Each platform has its unique limitations that are affecting performance of offloaded computing tasks, e.g. data transfer tax, task initialization overhead, memory latency and bandwidth limitations. In order to take into account those constraints, software developers need tooling for collecting right information and producing recommendations to make the best design and optimization decisions.
In this presentation we introduce two new GPU performance analysis types in Intel® VTune™ Profiler, and a methodology of heterogeneous applications performance profiling supported by the analyses. VTune Profiler is a well-known tool for performance characterization on CPUs, now it includes GPU Offload Analysis and GPU Hotspots Analysis of applications written on most offloading models with OpenCL, SYCL/Data Parallel C++, and OpenMP Offload.