TensorRT模型裝換

本文轉載自查看原文 2021-10-30 21:32 1523

當前支持的深度學習框架主要有：caffe、tensorflow、pytorch;

tensorflow深度學習框架：當前最佳的模型提供形式是pb,這是一種Frozen Graphdef形式的模型文件， Frozen Graphdef 將tensorflow導出的模型的權重都凍結，使得其都變為常量。並且模型參數和網絡結構保存在同一個文件中，可以在python以及java中自由調用。
pytorch深度學習框架：最佳的模型提供形式是onnx，ONNX是一種針對機器學習所設計的開放式的文件格式，用於存儲訓練好的模型。它使得不同的人工智能框架可以采用相同格式存儲模型數據並交互。 ONNX的規范及代碼主要由微軟，亞馬遜，Facebook和IBM等公司共同開發。https://pytorch.org/docs/stable/onnx.html

# pytorch模型轉onnx

import torch
torch_model = torch.load("save.pt") # pytorch模型加載
batch_size = 1  #批處理大小
input_shape = (3,244,244)   #輸入數據
 
# set the model to inference mode
torch_model.eval()
 
x = torch.randn(batch_size,*input_shape)        # 生成張量
export_onnx_file = "test.onnx"                  # 目的ONNX文件名
torch.onnx.export(torch_model,
                    x,
                    export_onnx_file,
                    opset_version=10,
                    do_constant_folding=True,   # 是否執行常量折疊優化
                    input_names=["input"],      # 輸入名
                    output_names=["output"],    # 輸出名
                    dynamic_axes={"input":{0:"batch_size"},     # 批處理變量
                                    "output":{0:"batch_size"}})

caffe深度學習框架：對於caffe而言，不需要進行模型的轉換直接可以使用caffe的caffemodel、deploy prototext即可。

Tensorflow

對於Tensorflow模型，有兩種途徑轉換成TensorRT支持的engine。第一種是pb→uff→engine；另一種是pb→onnx→engine,

第一種方式

#tensorflow pb轉uff

首先安裝convert-to-uff: apt install uff-converter-tf
執行：python3 /usr/local/bin/convert-to-uff --help
輸出：
 
Converts TensorFlow models to Unified Framework Format (UFF).
 
positional arguments:
  input_file            path to input model (protobuf file of frozen GraphDef)
 
optional arguments:
  -h, --help            show this help message and exit
  -l, --list-nodes      show list of nodes contained in input file
  -t, --text            write a text version of the output in addition to the
                        binary
  --write_preprocessed  write the preprocessed protobuf in addition to the
                        binary
  -q, --quiet           disable log messages
  -d, --debug           Enables debug mode to provide helpful debugging output
  -o OUTPUT, --output OUTPUT
                        name of output uff file
  -O OUTPUT_NODE, --output-node OUTPUT_NODE
                        name of output nodes of the model
  -I INPUT_NODE, --input-node INPUT_NODE
                        name of a node to replace with an input to the model.
                        Must be specified as:
                        "name,new_name,dtype,dim1,dim2,..."
  -p PREPROCESSOR, --preprocessor PREPROCESSOR
                        the preprocessing file to run before handling the
                        graph. This file must define a `preprocess` function
                        that accepts a GraphSurgeon DynamicGraph as it's
                        input. All transformations should happen in place on
                        the graph, as return values are discarded
轉換過程:
python3 /usr/local/bin/convert-to-uff model.pb -o model.uff -O softmax/Softmax -I input_1,input_1,float32,1,3,224,224

# tensorflow uff轉engine


執行：/usr/src/tensorrt/bin/trtexec --help
輸出：
 
=== Model Options ===
  --uff=<file>                UFF model
  --onnx=<file>               ONNX model
  --model=<file>              Caffe model (default = no model, random weights used)
  --deploy=<file>             Caffe prototxt file
  --output=<name>[,<name>]*   Output names (it can be specified multiple times); at least one output is required for UFF and Caffe
  --uffInput=<name>,X,Y,Z     Input blob name and its dimensions (X,Y,Z=C,H,W), it can be specified multiple times; at least one is required for UFF models
  --uffNHWC                   Set if inputs are in the NHWC layout instead of NCHW (use X,Y,Z=H,W,C order in --uffInput)
 
=== Build Options ===
  --maxBatch                  Set max batch size and build an implicit batch engine (default = 1)
  --explicitBatch             Use explicit batch sizes when building the engine (default = implicit)
  --minShapes=spec            Build with dynamic shapes using a profile with the min shapes provided
  --optShapes=spec            Build with dynamic shapes using a profile with the opt shapes provided
  --maxShapes=spec            Build with dynamic shapes using a profile with the max shapes provided
                              Note: if any of min/max/opt is missing, the profile will be completed using the shapes
                                    provided and assuming that opt will be equal to max unless they are both specified;
                                    partially specified shapes are applied starting from the batch size;
                                    dynamic shapes imply explicit batch
                                    input names can be wrapped with single quotes (ex: 'Input:0')
                              Input shapes spec ::= Ishp[","spec]
                                           Ishp ::= name":"shape
                                          shape ::= N[["x"N]*"*"]
  --inputIOFormats=spec       Type and formats of the input tensors (default = all inputs in fp32:chw)
  --outputIOFormats=spec      Type and formats of the output tensors (default = all outputs in fp32:chw)
                              IO Formats: spec  ::= IOfmt[","spec]
                                          IOfmt ::= type:fmt
                                          type  ::= "fp32"|"fp16"|"int32"|"int8"
                                          fmt   ::= ("chw"|"chw2"|"chw4"|"hwc8"|"chw16"|"chw32")["+"fmt]
  --workspace=N               Set workspace size in megabytes (default = 16)
  --minTiming=M               Set the minimum number of iterations used in kernel selection (default = 1)
  --avgTiming=M               Set the number of times averaged in each iteration for kernel selection (default = 8)
  --fp16                      Enable fp16 algorithms, in addition to fp32 (default = disabled)
  --int8                      Enable int8 algorithms, in addition to fp32 (default = disabled)
  --calib=<file>              Read INT8 calibration cache file
  --safe                      Only test the functionality available in safety restricted flows
  --saveEngine=<file>         Save the serialized engine
  --loadEngine=<file>         Load a serialized engine
 
=== Inference Options ===
  --batch=N                   Set batch size for implicit batch engines (default = 1)
  --shapes=spec               Set input shapes for dynamic shapes inputs. Input names can be wrapped with single quotes(ex: 'Input:0')
                              Input shapes spec ::= Ishp[","spec]
                                           Ishp ::= name":"shape
                                          shape ::= N[["x"N]*"*"]
  --loadInputs=spec           Load input values from files (default = generate random inputs). Input names can be wrapped with single quotes (ex: 'Input:0')
                              Input values spec ::= Ival[","spec]
                                           Ival ::= name":"file
  --iterations=N              Run at least N inference iterations (default = 10)
  --warmUp=N                  Run for N milliseconds to warmup before measuring performance (default = 200)
  --duration=N                Run performance measurements for at least N seconds wallclock time (default = 3)
  --sleepTime=N               Delay inference start with a gap of N milliseconds between launch and compute (default = 0)
  --streams=N                 Instantiate N engines to use concurrently (default = 1)
  --exposeDMA                 Serialize DMA transfers to and from device. (default = disabled)
  --useSpinWait               Actively synchronize on GPU events. This option may decrease synchronization time but increase CPU usage and power (default = disabled)
  --threads                   Enable multithreading to drive engines with independent threads (default = disabled)
  --useCudaGraph              Use cuda graph to capture engine execution and then launch inference (default = disabled)
  --buildOnly                 Skip inference perf measurement (default = disabled)
 
=== Build and Inference Batch Options ===
                              When using implicit batch, the max batch size of the engine, if not given,
                              is set to the inference batch size;
                              when using explicit batch, if shapes are specified only for inference, they
                              will be used also as min/opt/max in the build profile; if shapes are
                              specified only for the build, the opt shapes will be used also for inference;
                              if both are specified, they must be compatible; and if explicit batch is
                              enabled but neither is specified, the model must provide complete static
                              dimensions, including batch size, for all inputs
 
=== Reporting Options ===
  --verbose                   Use verbose logging (default = false)
  --avgRuns=N                 Report performance measurements averaged over N consecutive iterations (default = 10)
  --percentile=P              Report performance for the P percentage (0<=P<=100, 0 representing max perf, and 100 representing min perf; (default = 99%)
  --dumpOutput                Print the output tensor(s) of the last inference iteration (default = disabled)
  --dumpProfile               Print profile information per layer (default = disabled)
  --exportTimes=<file>        Write the timing results in a json file (default = disabled)
  --exportOutput=<file>       Write the output tensors to a json file (default = disabled)
  --exportProfile=<file>      Write the profile information per layer in a json file (default = disabled)
 
=== System Options ===
  --device=N                  Select cuda device N (default = 0)
  --useDLACore=N              Select DLA core N for layers that support DLA (default = none)
  --allowGPUFallback          When DLA is enabled, allow GPU fallback for unsupported layers (default = disabled)
  --plugins                   Plugin library (.so) to load (can be specified multiple times)
 
=== Help ===
  --help                      Print this message
 
轉換過程:
/usr/src/tensorrt/bin/trtexec --uff=/home/model/model.uff --uffInput=input_1,1,3,224,224 --output=softmax/Softmax --saveEngine=/home/model/model.engine --outputIOFormats=fp32:chw --buildOnly --useCudaGraph

第二種方式

# tensorflow pb轉onnx，生成engine


第一步安裝tf2onnx：pip install -U tf2onnx
執行：python3 -m tf2onnx.convert --help 查看使用方式
輸出：
usage: convert.py [-h] [--input INPUT] [--graphdef GRAPHDEF]
                  [--saved-model SAVED_MODEL] [--tag TAG]
                  [--signature_def SIGNATURE_DEF]
                  [--concrete_function CONCRETE_FUNCTION]
                  [--checkpoint CHECKPOINT] [--keras KERAS] [--large_model]
                  [--output OUTPUT] [--inputs INPUTS] [--outputs OUTPUTS]
                  [--opset OPSET] [--custom-ops CUSTOM_OPS]
                  [--extra_opset EXTRA_OPSET] [--target {rs4,rs5,rs6,caffe2}]
                  [--continue_on_error] [--verbose] [--debug]
                  [--output_frozen_graph OUTPUT_FROZEN_GRAPH] [--fold_const]
                  [--inputs-as-nchw INPUTS_AS_NCHW]
 
Convert tensorflow graphs to ONNX.
 
optional arguments:
  -h, --help            show this help message and exit
  --input INPUT         input from graphdef
  --graphdef GRAPHDEF   input from graphdef
  --saved-model SAVED_MODEL
                        input from saved model
  --tag TAG             tag to use for saved_model
  --signature_def SIGNATURE_DEF
                        signature_def from saved_model to use
  --concrete_function CONCRETE_FUNCTION
                        For TF2.x saved_model, index of func signature in
                        __call__ (--signature_def is ignored)
  --checkpoint CHECKPOINT
                        input from checkpoint
  --keras KERAS         input from keras model
  --large_model         use the large model format (for models > 2GB)
  --output OUTPUT       output model file
  --inputs INPUTS       model input_names
  --outputs OUTPUTS     model output_names
  --opset OPSET         opset version to use for onnx domain
  --custom-ops CUSTOM_OPS
                        list of custom ops
  --extra_opset EXTRA_OPSET
                        extra opset with format like domain:version, e.g.
                        com.microsoft:1
  --target {rs4,rs5,rs6,caffe2}
                        target platform
  --continue_on_error   continue_on_error
  --verbose, -v         verbose output, option is additive
  --debug               debug mode
  --output_frozen_graph OUTPUT_FROZEN_GRAPH
                        output frozen tf graph to file
  --fold_const          Deprecated. Constant folding is always enabled.
  --inputs-as-nchw INPUTS_AS_NCHW
                        transpose inputs as from nhwc to nchw
 
Usage Examples:
 
python -m tf2onnx.convert --saved-model saved_model_dir --output model.onnx
python -m tf2onnx.convert --input frozen_graph.pb  --inputs X:0 --outputs output:0 --output model.onnx
python -m tf2onnx.convert --checkpoint checkpoint.meta  --inputs X:0 --outputs output:0 --output model.onnx
 
For help and additional information see:
    https://github.com/onnx/tensorflow-onnx
 
If you run into issues, open an issue here:
    https://github.com/onnx/tensorflow-onnx/issues
 
轉換成onnx 需要執行conda activate tensorflow
python3 -m tf2onnx.convert --input model.pb --inputs input_1:0 --outputs softmax/Softmax:0 --inputs-as-nchw input_1:0 --output model.onnx --opset 13
 
 
onnx轉engine(這是動態輸入時的轉換方式,這個時候需要退出 tensorflow的conda環境)
/usr/src/tensorrt/bin/trtexec --onnx=/home/model/model.onnx  --explicitBatch --minShapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --optShapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --maxShapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --shapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --inputIOFormats=fp32:chw --outputIOFormats=fp32:chw --saveEngine=/home/model/model.engine  --buildOnly --useCudaGraph
 
如果是固定輸入則去除--explicitBatch --minShapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --optShapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --maxShapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --shapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3，然后增加--batch batch_size,這里的batch_size是tensorflow模型的batch_size

pytorch 模型轉engine

第一步：參考pytorch模型轉onnx將pytorch模型轉換成onnx模型
第二步：參考上面Tensorflow模型onnx轉engine步驟轉換即可

caffe轉engine

caffe轉engine只需參考tensorflow轉engine的第一種方式的第二步，只不過將--uff改成--model、--deploy的組合即可。

模型轉換之后服務的運行是依賴轉換過程中使用的硬件環境的，比如模型轉換是在Tesla V100S顯卡、Driver Version: 440.118.02 CUDA Version: 10.2 上轉換的則服務運行的機器也需要保持一致(Driver Version和cuda Version可以更高)。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 基於TensorRT的YOLO（V3\4\5）模型部署《方案一》 TensorRT 開始 onnx模型部署：TensorRT、OpenVino、ONNXRuntime、OpenCV dnn 顏色rgba、16進制、10進制互相裝換裝雙系統后無法引導切換雙系統 NVIDIA's Triton/TensorRT的Transformer語言模型性能評測與優化指南 TensorRT加速 ——NVIDIA終端AI芯片加速用，可以直接利用caffe或TensorFlow生成的模型來predict（inference）使用TensorRT對caffe和pytorch onnx版本的mnist模型進行fp32和fp16 推理 | tensorrt fp32 fp16 tutorial with caffe pytorch minist model 安裝TensorRT TensorRT部署