oneTBB Flow Graph and the OpenVINO™ Inference Engine
Expressing Dependencies across Deep Learning Models in C++
OpenVINO™ and the Need for a Coordination Framework
With the increasing availability of data in today’s world, traditional approaches to solving problems are being replaced by machine learning (ML) methods that learn from the data. The two main stages in an end-to-end ML pipeline are training and inferencing. Training is where data is used to create a model. Inferencing is where the model generates output from new data.
The Intel® Distribution of OpenVINO™ toolkit is a developer tool suite for high-performance deep learning inference on Intel® architectures. While we are starting to see the emergence of dedicated inference accelerators, the ubiquity of multicore CPUs means that better inference performance can deliver significant gains to a larger number of users than ever before. OpenVINO offers a multithreading model that is portable and free of low-level details. It’s not necessary for users to explicitly start and stop any threads, or even know how many processors or cores are being used. This results in optimized performance that is easy to deploy.
As ML becomes more complex, inferencing applications require multiple models, with some models depending on the output of other models. Such applications require coordination that enforces dependencies among the models during execution, while allowing models that can make progress independently to execute concurrently.
The OpenVINO Inference Engine itself does not provide a way to piece different models together. Its primary role is to provide mechanisms to tune and deploy high-performing models onto Intel architectures. The Intel Distribution of OpenVINO toolkit, however, does include DL Streamer, an extension of the widely used, open source GStreamer framework. GStreamer is a framework for creating complex media analytics pipelines that enforces dependencies between models. DL Streamer extends GStreamer to provide pipeline interoperability and optimized inferencing across Intel architectures. Both DL Streamer and GStreamer are excellent choices for building complex media pipelines, but not all developers want or need these larger frameworks.
In this article, we describe how the Intel® oneAPI Threading Building Blocks (oneTBB) library included in the Intel Distribution of the OpenVINO toolkit can coordinate OpenVINO inferencing models using a lightweight C++ alternative to GStreamer or DL Streamer.
oneTBB Flow Graph
oneTBB is a generic C++ library for parallel programming on CPUs. It has a long history, being an evolution of the Threading Building Blocks (TBB) library that has been available since 2006. It provides generic parallel algorithms, a flow graph interface, concurrent containers, a task-based work scheduler, a scalable memory manager, and auxiliary features that make parallel programming easier. The flow graph feature provides functions and classes (in the tbb::flow namespace) for applications that can be expressed as graphs of computations. A flow graph is used when you want to express the execution dependencies in your code, or if you have a streaming application that requires more than just a simple linear pipeline.
The flow graph interface consists are three main types of components:
- A graph object that represent a whole graph of computations
- Nodes that execute user-supplied lambda expressions, join streams of data together, or split and broadcast data
- Edges that express the dependencies or communication channels between nodes
Here is a summary of some of the nodes that are useful in building a graph of inference engine models:
- source_node (or input_node): A source_node provides functionality to generate data that is fed into the rest of the graph. This node can be used for reading and making available frames from a video stream. Note that in oneTBB, source_node has been replaced with input_node, which has a different API. However, the latest version of OpenVINO at the time of writing is shipped with source_node.
- function_node: A function_node body executes user-provided code on each data item that flows into the node to generate the data that flows out of the node. This node type can be used to run the inference computations. Function nodes that do not directly or indirectly depend on each other (as expressed by the graph edges) are allowed to execute concurrently. If the edges in the graph express a dependence between nodes, the oneTBB runtime library will enforce the dependence and ensure that the nodes execute in the correct order. A function_node can be configured to allow or disallow concurrent execution of its user-provided body on different data items as they flow through the node, making it suitable for operating on different data items in parallel, as well as for executing operations that must be serialized, such as displaying processed video frames in the correct order.
- sequencer_node: In cases where the output is required in a certain order, the sequencer node can be used to maintain that order. For example, it can be used to sequence video frames that have been computed out of order
- join_node: The join node brings multiple streams of data together. If independently computed outputs need to be brought together as a unit, a join_node can be used.
To demonstrate how oneTBB is used to coordinate OpenVINO inferencing, we adapted a security barrier demo that is included in the Intel Distribution of OpenVINO toolkit. It implements inferencing on three trained models: a vehicle detector that detects vehicles in a video frame, a vehicle classifier that classifies the color and type of the detected vehicle, and a license plate recognition model that extracts the license plate text. The vehicle classifier and license plate recognition models depend on the vehicle detector model.
To set up the graph, we use a source_node to read the video frames, followed by a function_node to perform the vehicle detection. Two other function nodes are used to run the vehicle classifier and license plate recognition. A join_node is used to aggregate the results from each frame, followed by a sequencer_node to reorder the inferenced video frames. The final node is another function_node that shows or saves the results as desired. The graph is shown in Figure 1.
Reading the Input Video Frames
We use a source_node to generate images from the video for inferencing. It will repeatedly execute its body and generate frames until a terminating condition is encountered. In this case, the terminating condition is an empty frame at the end of the video file. The template parameter of the source_node is the data type of the output. The node returns a std::tuple comprised of an image and a corresponding image number. The frame number is included because it is required downstream for aggregation and to reestablish the frame order. If this were not required, the frame number could be omitted.
We begin by including the header file with the flow graph nodes
We declare several data types to hold the messages that flow through the graph. We use f to denote that the message includes the frame itself, n to mean that the messages includes the frame number, v to mean it includes the bounding rectangles for the vehicles, p to mean it includes the bounding rectangles for the license plates, and s to mean it includes a vector<std::string>.
We also initialize a frame counter and a capture object to read the frames.
We then create a flow graph object and the source_node that reads the frames
The security barrier demo comes with three classes for detection, classification, and license plate recognition. The classes provide functionality to create inference requests, to submit inference computations, and to collect results. We instantiate the following classes: detector for detection, vclassifier for classification, and preader for license plate recognition.
A function_node is used for vehicle detection inference computations. There are two template parameters for the function node: the input data type that the function node receives and the data type that the function node sends out. The input of the vehicle detector is the pair of image and corresponding number received from the source_node. In the body of the node, we create an inference request, prepare the blob for inferencing, start the inference computation, and then wait for the results of the computation to be ready. The result from the detector is a confidence level, a label showing whether it is a vehicle or a license plate, and the coordinates of the detected object (x, y, width, and height). We populate a vector with a list of the detected objects on each frame and send it downstream for further processing. The output type, which we named fnvp_t, contains the frame, the frame number, and a list of locations of the detected vehicles.
Detected Vehicle Classification
The vehicle classifier receives the frame, the frame number, and a list of detected vehicle locations. For each vehicle, we create an inference request, prepare the blob, start the inferencing to classify the vehicle, and then collect the results when they are ready. The classification results are the color and type of the vehicle. This is passed downstream for further processing.
The datatype for the output of the classifier is a list of strings.
Detected Vehicle License Plate Recognition
Like the classifier, the license plate recognition receives the frame, its number, and a list of detected vehicles locations. Likewise, an inference request is created and submitted. The result is a string of the license plate number.
If only the raw inferencing results are required, the nodes described to this point should be enough. However, if postprocessing is required, the nodes that follow can be added.
Here we aggregate the computations for each frame. Using the frame number as the key, a tag-matching join_node is used to put together all the results for each frame. Each frame will therefore have its number, the classification result of any detected vehicles, and a string of any detected license plates.
The computation is done in parallel, so it is possible that the inferencing results for the frames can complete out of order. With a sequencer_node, we can reorder the frames if video playback is required.
Superimposing the Results
Lastly, we implement a function_node that superimposes the locations of the detected vehicles and license plate numbers onto the frame. The results can be played back or saved to an output video. In cases where the computations complete much faster than the playback rate, a throttling mechanism can be used to prevent a build-up of processed frames.
Building the Graph and Expressing Dependencies
The graph is built and dependencies are expressed with the oneTBB function tbb::flow::make_edge. It takes a predecessor node and a successor node. This can be used to build any desired graph topology. The only requirement is to make sure that the data types on the edges match. That is, the data type of the output node matches the data type of the input node.
We build the graph by making an edge from the frame generation to the vehicle detection.
The vehicle detector is connected in parallel to the classifier and license plate recognition.
The detector, classifier, and license plate recognition are connected to the aggregator. The detector provides the frame, while the classifier and the license plate recognition provide the inference results for that frame for aggregation.
The aggregator is connected to the sequencer for reordering of the frames.
The sequencer is connected to the output node to output the results.
Activating the Graph
The final step is to activate the graph so that it can run. This is done with the activate() member of the source node. A call to g.wait_for_all() ensures that all computations in the graph are completed before the graph exits.
To evaluate our implementation, we used pretrained models from the Open Model Zoo repository: vehiclelicense-plate-detection-barrier-0106_fp16, vehicle-attributes-recognition-barrier-0039_fp16, and licenseplate-recognition-barrier-0001. We ran our application on an Intel® Core™ i7-6770HQ processor with integrated Iris Pro Graphics 580. The elapsed time for running all inferences on 32,030 frames from a video was measured. The performance gain of the oneTBB flow graph over a corresponding manually threaded implementation is shown in Figure 2. Performance gains of up to 11% were obtained. We evaluated five different configurations that assigned models differently to the CPU and integrated GPU.
Where to Get oneTBB Flow Graph
The Intel Distribution of OpenVINO toolkit already ships with oneTBB included, so using the flow graph is not an additional dependency as it is already there in OpenVINO. One simply needs to update the build infrastructure (CMakeLists.txt) to enable its use:
We have shown how to use a oneTBB flow graph to express dependencies across models used for inferencing. The flow graph makes it easy to express such dependencies, as well as express parallelism across the models directly in C++. For developers who don’t want to use larger frameworks like Gstreamer or DL Streamer, we believe that using a oneTBB flow graph is a good option to consider. Our implementation of the OpenVINO security barrier demo showed performance gains of up to 11% over a manually threaded implementation when using a oneTBB flow graph. The fact that the flow graph is a C++ framework, is shipped with Intel Distribution of OpenVINO toolkit, and has support for threading makes it a natural choice to build graphs of ML models for inferencing in OpenVINO.