繼續 - 碼上歡樂

繼續

本文轉載自查看原文 2019-05-13 20:17 562

Import From ONNX

ONNX版本更迭比較快，TensorRT 5.1.x支持ONNX Parser支持ONNX IR（中間表示）版本0.0.3，opset版本9。ONNX版本不兼容的問題，見ONNX Model Opset Version Converter。

Create the build, network, and parser

with builder = trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
with open(model_path, 'rb') as model:
parser.parse(model.read())

Building the engine

對象bulider有很多的屬性，可以用這些屬性可以控制量化精度、batch size 大小等等。

builder有兩個重要的屬性：

1、maximum batch size：TensorRT可以優化的最大的batch size，實際運行時，選擇的batch size小於等於該值。

2、the maximum workspace size：用於層算法的臨時空間，這個值限制了網絡中層占用空間的最大值，如果這個值設置的小了，TensorRT可能找不到給定層的實現。

build the engine:

builder.max_batch_size = max_batch_size
builder.max_workspace_size = 1 <<  20 # This determines the amount of memory available to the builder when building an optimized engine and should generally be set as high as possible.
with trt.Builder(TRT_LOGGER) as builder:
with builder.build_cuda_engine(network) as engine:
# Do inference here.

在engine built的同時，TensorRT copy權重數據。

serializing a model

所謂的serialize，就是將這個engine轉換為一種格式存儲起來，后面用在inference上。

在Inference的時候，只需要deserialize這個存儲的engine就可以了。

之所以這樣做，是因為build engine的過程時比較消耗時間的，如果能將已經build的engine存儲起來后面調用，這會加速整個inference的准備時間。

注意：保存的engine時不能跨平台使用的。

Serialize the model to a modelstream:

serialized_engine = engine.serialize()

Deserialize modelstream to perform inference. Deserializing requires creation of a runtime object：

with trt.Runtime(TRT_LOGGER) as runtime:
engine = runtime.deserialize_cuda_engine(serialized_engine)

如果是將這個engine保存在一個文件中：

Serialize the engine and write to a file:

with open(“sample.engine”, “wb”) as f:
        f.write(engine.serialize())

Read the engine from the file and deserialize:

with open(“sample.engine”, “rb”) as f, trt.Runtime(TRT_LOGGER) as runtime:
        engine = runtime.deserialize_cuda_engine(f.read())

Performing Inference
為輸入輸出分配一些 host and device buffers

# Determine dimensions and create page-locked memory buffers (i.e. won't be swapped to disk) to hold host inputs/outputs.
        h_input = cuda.pagelocked_empty(engine.get_binding_shape(0).volume(), dtype=np.float32)
        h_output = cuda.pagelocked_empty(engine.get_binding_shape(1).volume(), dtype=np.float32)
        # Allocate device memory for inputs and outputs.
        d_input = cuda.mem_alloc(h_input.nbytes)
        d_output = cuda.mem_alloc(h_output.nbytes)
        # Create a stream in which to copy inputs/outputs and run inference.
        stream = cuda.Stream()

需要創建空間保存中間的激活值。engine里面有網絡的定義和訓練好的權重，需要額外的空間。These are held in an execution context:

with engine.create_execution_context() as context:
        # Transfer input data to the GPU.
        cuda.memcpy_htod_async(d_input, h_input, stream)
        # Run inference.
        context.execute_async(bindings=[int(d_input), int(d_output)], stream_handle=stream.handle)
        # Transfer predictions back from the GPU.
        cuda.memcpy_dtoh_async(h_output, d_output, stream)
        # Synchronize the stream
        stream.synchronize()
        # Return the host output. 
return h_output

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Darknet如何繼續用已有模型繼續訓練為什么繼續選擇DELPHI? 線程暫停與繼續實現 Task的暫停，繼續，取消 Unity 游戲暫停和繼續 python如何換行繼續輸入 python如何換行繼續輸入 java 線程的開始、暫停、繼續線程暫停與繼續如何暫停和繼續運行Linux程序