tensorrt 中的一些基本概念 Logger, Context, Engine, Builder, Network, Parser 知識梳理

本文轉載自查看原文 2021-06-03 12:38 4075 C++/ TensorRT

先來看一下官方文檔的api簡介：

TensorRT provides a C++ implementation on all supported platforms, and a Python implementation on Linux. Python is not currently supported on Windows or QNX.

The key interfaces in the TensorRT core library are:
Network Definition
The Network Definition interface provides methods for the application to define a network. Input and output tensors can be specified, and layers can be added and configured. As well as layer types, such as convolutional and recurrent layers, a Plugin layer type allows the application to implement functionality not natively supported by TensorRT.

Optimization Profile
An optimization profile specifies constraints on dynamic dimensions.

Builder Configuration
The Builder Configuration interface specifies details for creating an engine. It allows the application to specify optimization profiles, maximum workspace size, the minimum acceptable level of precision, timing iteration counts for autotuning, and an interface for quantizing networks to run in 8-bit precision.

Builder
The Builder interface allows the creation of an optimized engine from a network definition and a builder configuration.

Engine
The Engine interface allows the application to execute inference. It supports synchronous and asynchronous execution, profiling, and enumeration and querying of the bindings for the engine inputs and outputs. A single-engine can have multiple execution contexts, allowing a single set of trained parameters to be used for the simultaneous execution of multiple inferences.

ONNX Parser
This parser can be used to parse an ONNX model.

C++ API vs Python API
In theory, the C++ API and the Python API should be close to identical in supporting your needs. The C++ API should be used in any performance-critical scenarios, as well as in situations where safety is important, for example, in automotive.
The main benefit of the Python API is that data preprocessing and postprocessing are easy to use because you’re able to use a variety of libraries like NumPy and SciPy.

谷歌翻譯：

TensorRT 在所有支持的平台上提供 C++ 實現，在 Linux 上提供 Python 實現。 Windows 或 QNX 目前不支持 Python。

TensorRT 核心庫中的關鍵接口是：
網絡定義 Network Definition
網絡定義接口為應用程序提供了定義網絡的方法。可以指定輸入和輸出張量，並且可以添加和配置層。除了層類型，例如卷積層和循環層，插件層類型還允許應用程序實現 TensorRT 本身不支持的功能。

優化配置文件 Optimization Profile
優化配置文件指定對動態維度的約束。

構建器配置 Builder Configuration
構建器配置接口指定了創建引擎的詳細信息。它允許應用程序指定優化配置文件、最大工作空間大小、最小可接受精度水平、自動調整的時序迭代計數以及用於量化網絡以 8 位精度運行的接口。

構建器Builder
Builder 接口允許根據網絡定義和構建器配置創建優化引擎。

引擎
Engine 接口允許應用程序執行推理。它支持同步和異步執行、分析、枚舉和查詢引擎輸入和輸出的綁定。單個引擎可以有多個執行上下文（execution contexts），允許使用一組經過訓練的參數同時執行多個推理。

ONNX解析器
此解析器可用於解析 ONNX 模型。

C++ API 與 Python API
理論上，C++ API 和 Python API 在支持您的需求方面應該接近相同。 C++ API 應該用於任何對性能至關重要的場景，以及安全很重要的情況，例如在汽車中。
Python API 的主要好處是數據預處理和后處理易於使用，因為您可以使用各種庫，如 NumPy 和 SciPy。

https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#api

————————————————————————————————————————————————————————————————

什么是Logger ？

顧名思義，這是日志組件，用於管理builder, engine 和 runtime 的日志信息。

根據 tensorrt logging.h 頭文件中對 Logger類的注釋：

This class provides a common interface for TensorRT tools and samples to log information to the console, and supports logging two types of messages:
- Debugging messages with an associated severity (info, warning, error, or internal error/fatal)
- Test pass/fail messages
The advantage of having all samples use this class for logging as opposed to emitting directly to stdout/stderr is that the logic for controlling the verbosity and formatting of sample output is centralized in one location.
In the future, this class could be extended to support dumping test results to a file in some standard format(for example, JUnit XML), and providing additional metadata (e.g. timing the duration of a test run).

谷歌翻譯：

該類為 TensorRT 工具和示例提供了一個通用接口來將信息記錄到控制台，並支持記錄兩種類型的消息：
- 具有相關嚴重性（信息、警告、錯誤或內部錯誤/致命）的調試消息
- 測試通過/失敗消息
與直接發送到 stdout/stderr 相比，讓所有樣本都使用此類進行日志記錄的優勢在於，控制樣本輸出的詳細程度和格式的邏輯集中在一個位置。
將來，可以擴展此類以支持將測試結果轉儲到某種標准格式（例如，JUnit XML）的文件中，並提供額外的元數據（例如，對測試運行的持續時間進行計時）。

通常來說，logger 會作為一個必須的參數傳遞給 builder runtime parser的實例化接口：

IBuilder* builder = createInferBuilder(gLogger);
IRuntime* runtime = createInferRuntime(gLogger);
auto parser = nvonnxparser::createParser(*network, gLogger);

Logger在內部被視為單例，因此 IRuntime 和/或 IBuilder 的多個實例必須都使用相同的Logger。

參考：

https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_logger.html#details

https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#initialize_library

logging.h

————————————————————————————————————————————————————————————————

Context 執行上下文

類 IExecutionContext 定義在 NvInferRuntime.h 頭文件中：

Context for executing inference using an engine, with functionally unsafe features.

Multiple execution contexts may exist for one ICudaEngine instance, allowing the same engine to be used for the execution of multiple batches simultaneously. If the engine supports dynamic shapes, each execution context in concurrent use must use a separate optimization profile.

Warning

Do not inherit from this class, as doing so will break forward-compatibility of the API and ABI.

谷歌翻譯：

使用引擎執行推理的上下文，具有功能不安全的特性。

一個 ICudaEngine 實例可能存在多個執行上下文，允許使用同一個引擎同時執行多個批處理。如果引擎支持動態形狀，則並發使用的每個執行上下文必須使用單獨的優化配置文件。

警告
不要從此類繼承，因為這樣做會破壞 API 和 ABI 的向前兼容性。

一般的使用方法：

In order to run inference, use the interface IExecutionContext. In order to create an object of type IExecutionContext, first create an object of type ICudaEngine (the engine).

The builder or runtime will be created with the GPU context associated with the creating thread.Even though it is possible to avoid creating the CUDA context, (the default context will be created for you), it is not advisable. It is recommended to create and configure the CUDA context before creating a runtime or builder object.

谷歌翻譯：

為了運行推理，請使用接口 IExecutionContext。為了創建一個 IExecutionContext 類型的對象，首先創建一個 ICudaEngine（引擎）類型的對象。

構建器或運行時將使用與創建線程關聯的 GPU 上下文創建。即使可以避免創建 CUDA 上下文（將為您創建默認上下文），但不建議這樣做。建議在創建運行時或構建器對象之前創建和配置 CUDA 上下文。

常見用法：

const ICudaEngine& engine = context.getEngine();
IExecutionContext* context = engine->createExecutionContext();
context->destroy();

context.enqueue(batchSize, buffers, stream, nullptr);

//TensorRT execution is typically asynchronous, so enqueue the kernels on a CUDA stream.
//It is common to enqueue asynchronous memcpy() before and after the kernels to move data from the GPU if it is not already there. 
//The final argument to enqueueV2() is an optional CUDA event which will be signaled when the input buffers have been consumed and their memory may be safely reused.
//For more information, refer to enqueue() for implicit batch networks and enqueueV2() for explicit batch networks. 
//In the event that asynchronous is not wanted, see execute() and executeV2().
//The IExecutionContext contains shared resources, therefore, calling enqueue or enqueueV2 in from the same IExecutionContext object with different CUDA streams concurrently results in undefined behavior. 
//To perform inference concurrently in multiple CUDA streams, use one IExecutionContext per CUDA stream.

使用引擎進行推理需要用到執行上下文。

IExecutionContext *context = engine->createExecutionContext();

Create some space to store intermediate activation values. Since the engine holds the network definition and trained parameters, additional space is necessary. These are held in an execution context.

An engine can have multiple execution contexts, allowing one set of weights to be used for multiple overlapping inference tasks. For example, you can process images in parallel CUDA streams using one engine and one context per stream. Each context will be created on the same GPU as the engine.

創建一些空間來存儲中間激活值。由於引擎保存網絡定義和訓練參數，因此需要額外的空間。這些保存在執行上下文中。

一個引擎可以有多個執行上下文，允許一組權重用於多個重疊的推理任務。例如，您可以使用一個引擎和每個流一個上下文處理並行 CUDA 流中的圖像。每個上下文都將在與引擎相同的 GPU 上創建。

參考：

https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#perform_inference_c

https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#initialize_library

https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_execution_context.html#details

——————————————————————————————————————————————————————————————————

Engine

類 ICudaEngine 定義在 NvInferRuntime.h 頭文件中：

An engine for executing inference on a built network, with functionally unsafe features.

Warning

Do not inherit from this class, as doing so will break forward-compatibility of the API and ABI.

用於在構建的網絡上執行推理的引擎，具有功能不安全的特性。

警告
不要從此類繼承，因為這樣做會破壞 API 和 ABI 的向前兼容性。

使用builder創建engine：

IBuilderConfig* config = builder->createBuilderConfig();
config->setMaxWorkspaceSize(1 << 20);
ICudaEngine* engine = builder->buildEngineWithConfig(*network, *config);

當然，在此之前需要搭建完整的網絡：

參考：

https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_cuda_engine.html#details

https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#build_engine_c

——————————————————————————————————————————————————————————————————

Network

類 INetworkDefinition 定義在 NvInfer.h 頭文件中：

A network definition for input to the builder.

A network definition defines the structure of the network, and combined with a IBuilderConfig, is built into an engine using an IBuilder. An INetworkDefinition can either have an implicit batch dimensions, specified at runtime, or all dimensions explicit, full dims mode, in the network definition. When a network has been created using createNetwork(), only implicit batch size mode is supported. The function hasImplicitBatchDimension() is used to query the mode of the network.

A network with implicit batch dimensions returns the dimensions of a layer without the implicit dimension, and instead the batch is specified at execute/enqueue time. If the network has all dimensions specified, then the first dimension follows elementwise broadcast rules: if it is 1 for some inputs and is some value N for all other inputs, then the first dimension of each outut is N, and the inputs with 1 for the first dimension are broadcast. Having divergent batch sizes across inputs to a layer is not supported.

作為構建器輸入的網絡定義。

網絡定義定義了網絡的結構，與 IBuilderConfig 結合使用 IBuilder 構建到引擎中。 INetworkDefinition 可以具有在運行時指定的隱式批處理維度，或所有維度顯式、完全維度模式。使用 createNetwork() 創建網絡后，僅支持隱式批量大小模式。函數 hasImplicitBatchDimension() 用於查詢網絡的模式。

具有隱式批處理維度的網絡返回沒有隱式維度的層的維度，而是在execute/enqueue 時指定批處理大小。如果網絡指定了所有維度，則第一個維度遵循元素廣播規則：如果某些輸入為 1，所有其他輸入為某個值 N，則每個輸出的第一個維度為 N，輸入為 1第一個維度是廣播。不支持跨層的輸入具有不同的批次大小。

網絡搭建示例：（C++接口，第一個是實際用例，第二個是官方文檔示例）

Use method IBuilder::createNetworkV2 to create an object of type INetworkDefinition.

INetworkDefinition* network = builder->createNetworkV2(0U);

IBuilder* builder = createInferBuilder(gLogger);
INetworkDefinition* network = builder->createNetworkV2(1U << static_cast<uint32_t>(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH));

Add the Input layer to the network, with the input dimensions, including dynamic batch.

ITensor* data = network->addInput(INPUT_BLOB_NAME, dt, Dims3{1, INPUT_H, INPUT_W});

auto data = network->addInput(INPUT_BLOB_NAME, dt, Dims3{-1, 1, INPUT_H, INPUT_W});

Add the Convolution layer

IConvolutionLayer* conv1 = network->addConvolutionNd(*data, 6, DimsHW{5, 5}, weightMap["conv1.weight"], weightMap["conv1.bias"]);
conv1->setStrideNd(DimsHW{1, 1});

auto conv1 = network->addConvolution(*data->getOutput(0), 20, DimsHW{5, 5}, weightMap["conv1filter"], weightMap["conv1bias"]);
conv1->setStride(DimsHW{1, 1});

Note: Weights passed to TensorRT layers are in host memory.

Add the Pooling layer

IPoolingLayer* pool1 = network->addPoolingNd(*relu1->getOutput(0), PoolingType::kAVERAGE, DimsHW{2, 2});
pool1->setStrideNd(DimsHW{2, 2});

auto pool1 = network->addPooling(*conv1->getOutput(0), PoolingType::kMAX, DimsHW{2, 2});
pool1->setStride(DimsHW{2, 2});

Add activation layer using the ReLU algorithm

IActivationLayer* relu1 = network->addActivation(*conv1->getOutput(0), ActivationType::kRELU);

auto relu1 = network->addActivation(*ip1->getOutput(0), ActivationType::kRELU);

Add fully connected layer

IFullyConnectedLayer* fc1 = network->addFullyConnected(*pool2->getOutput(0), 120, weightMap["fc1.weight"], weightMap["fc1.bias"]);

auto ip1 = network->addFullyConnected(*pool1->getOutput(0), 500, weightMap["ip1filter"], weightMap["ip1bias"]);

Add the SoftMax layer to calculate the final probabilities and set it as the output:

ISoftMaxLayer* prob = network->addSoftMax(*fc3->getOutput(0));
prob->getOutput(0)->setName(OUTPUT_BLOB_NAME);
network->markOutput(*prob->getOutput(0));

auto prob = network->addSoftMax(*relu1->getOutput(0));
prob->getOutput(0)->setName(OUTPUT_BLOB_NAME);
network->markOutput(*prob->getOutput(0));

參考：

https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#create_network_c

https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_network_definition.html#details

——————————————————————————————————————————————————————————————————

parser

解析器主要用於解析ONNX模型並將其轉換為tensorrt模型。類IParser 定義在NvOnnxParser.h 頭文件中：

an object for parsing ONNX models into a TensorRT network definition

Create an ONNX parser using the INetwork definition as the input:

auto parser = nvonnxparser::createParser(*network, gLogger);

參考：

https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#initialize_library

https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvonnxparser_1_1_i_parser.html#details

—————————————————————————————————————————————————————————————————

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 oracle中的一些基本概念 Tensorflow一些常用基本概念與函數（二） Tensorflow一些常用基本概念與函數（三）通信里的一些基本概念 Tensorflow一些常用基本概念與函數存儲的一些基本概念（HBA，LUN）樹的一些基本概念術語初學者Web介紹一些前端開發中的基本概念用到的技術 storm中的一些概念知識圖譜基本概念