TensorRT模型裝換


 當前支持的深度學習框架主要有:caffe、tensorflow、pytorch;   

  1.  tensorflow深度學習框架:當前最佳的模型提供形式是pb,這是一種Frozen Graphdef形式的模型文件, Frozen Graphdef 將tensorflow導出的模型的權重都凍結,使得其都變為常量。並且模型參數和網絡結構保存在同一個文件中,可以在python以及java中自由調用。
  2. pytorch深度學習框架:最佳的模型提供形式是onnx,ONNX是一種針對機器學習所設計的開放式的文件格式,用於存儲訓練好的模型。它使得不同的人工智能框架可以采用相同格式存儲模型數據並交互。 ONNX的規范及代碼主要由微軟,亞馬遜,Facebook和IBM等公司共同開發。https://pytorch.org/docs/stable/onnx.html

# pytorch模型轉onnx
import torch
torch_model = torch.load("save.pt") # pytorch模型加載
batch_size = 1  #批處理大小
input_shape = (3,244,244)   #輸入數據
 
# set the model to inference mode
torch_model.eval()
 
x = torch.randn(batch_size,*input_shape)        # 生成張量
export_onnx_file = "test.onnx"                  # 目的ONNX文件名
torch.onnx.export(torch_model,
                    x,
                    export_onnx_file,
                    opset_version=10,
                    do_constant_folding=True,   # 是否執行常量折疊優化
                    input_names=["input"],      # 輸入名
                    output_names=["output"],    # 輸出名
                    dynamic_axes={"input":{0:"batch_size"},     # 批處理變量
                                    "output":{0:"batch_size"}})
  1. caffe深度學習框架:對於caffe而言,不需要進行模型的轉換直接可以使用caffe的caffemodel、deploy prototext即可。
  2.  Tensorflow

    • 對於Tensorflow模型,有兩種途徑轉換成TensorRT支持的engine。第一種是pb→uff→engine;另一種是pb→onnx→engine,

          第一種方式
    #tensorflow pb轉uff

     

    首先安裝convert-to-uff: apt install uff-converter-tf
    執行:python3 /usr/local/bin/convert-to-uff --help
    輸出:
     
    Converts TensorFlow models to Unified Framework Format (UFF).
     
    positional arguments:
      input_file            path to input model (protobuf file of frozen GraphDef)
     
    optional arguments:
      -h, --help            show this help message and exit
      -l, --list-nodes      show list of nodes contained in input file
      -t, --text            write a text version of the output in addition to the
                            binary
      --write_preprocessed  write the preprocessed protobuf in addition to the
                            binary
      -q, --quiet           disable log messages
      -d, --debug           Enables debug mode to provide helpful debugging output
      -o OUTPUT, --output OUTPUT
                            name of output uff file
      -O OUTPUT_NODE, --output-node OUTPUT_NODE
                            name of output nodes of the model
      -I INPUT_NODE, --input-node INPUT_NODE
                            name of a node to replace with an input to the model.
                            Must be specified as:
                            "name,new_name,dtype,dim1,dim2,..."
      -p PREPROCESSOR, --preprocessor PREPROCESSOR
                            the preprocessing file to run before handling the
                            graph. This file must define a `preprocess` function
                            that accepts a GraphSurgeon DynamicGraph as it's
                            input. All transformations should happen in place on
                            the graph, as return values are discarded
    轉換過程:
    python3 /usr/local/bin/convert-to-uff model.pb -o model.uff -O softmax/Softmax -I input_1,input_1,float32,1,3,224,224
    # tensorflow uff轉engine

    執行:/usr/src/tensorrt/bin/trtexec --help 輸出: === Model Options === --uff=<file> UFF model --onnx=<file> ONNX model --model=<file> Caffe model (default = no model, random weights used) --deploy=<file> Caffe prototxt file --output=<name>[,<name>]* Output names (it can be specified multiple times); at least one output is required for UFF and Caffe --uffInput=<name>,X,Y,Z Input blob name and its dimensions (X,Y,Z=C,H,W), it can be specified multiple times; at least one is required for UFF models --uffNHWC Set if inputs are in the NHWC layout instead of NCHW (use X,Y,Z=H,W,C order in --uffInput) === Build Options === --maxBatch Set max batch size and build an implicit batch engine (default = 1) --explicitBatch Use explicit batch sizes when building the engine (default = implicit) --minShapes=spec Build with dynamic shapes using a profile with the min shapes provided --optShapes=spec Build with dynamic shapes using a profile with the opt shapes provided --maxShapes=spec Build with dynamic shapes using a profile with the max shapes provided Note: if any of min/max/opt is missing, the profile will be completed using the shapes provided and assuming that opt will be equal to max unless they are both specified; partially specified shapes are applied starting from the batch size; dynamic shapes imply explicit batch input names can be wrapped with single quotes (ex: 'Input:0') Input shapes spec ::= Ishp[","spec] Ishp ::= name":"shape shape ::= N[["x"N]*"*"] --inputIOFormats=spec Type and formats of the input tensors (default = all inputs in fp32:chw) --outputIOFormats=spec Type and formats of the output tensors (default = all outputs in fp32:chw) IO Formats: spec ::= IOfmt[","spec] IOfmt ::= type:fmt type ::= "fp32"|"fp16"|"int32"|"int8" fmt ::= ("chw"|"chw2"|"chw4"|"hwc8"|"chw16"|"chw32")["+"fmt] --workspace=N Set workspace size in megabytes (default = 16) --minTiming=M Set the minimum number of iterations used in kernel selection (default = 1) --avgTiming=M Set the number of times averaged in each iteration for kernel selection (default = 8) --fp16 Enable fp16 algorithms, in addition to fp32 (default = disabled) --int8 Enable int8 algorithms, in addition to fp32 (default = disabled) --calib=<file> Read INT8 calibration cache file --safe Only test the functionality available in safety restricted flows --saveEngine=<file> Save the serialized engine --loadEngine=<file> Load a serialized engine === Inference Options === --batch=N Set batch size for implicit batch engines (default = 1) --shapes=spec Set input shapes for dynamic shapes inputs. Input names can be wrapped with single quotes(ex: 'Input:0') Input shapes spec ::= Ishp[","spec] Ishp ::= name":"shape shape ::= N[["x"N]*"*"] --loadInputs=spec Load input values from files (default = generate random inputs). Input names can be wrapped with single quotes (ex: 'Input:0') Input values spec ::= Ival[","spec] Ival ::= name":"file --iterations=N Run at least N inference iterations (default = 10) --warmUp=N Run for N milliseconds to warmup before measuring performance (default = 200) --duration=N Run performance measurements for at least N seconds wallclock time (default = 3) --sleepTime=N Delay inference start with a gap of N milliseconds between launch and compute (default = 0) --streams=N Instantiate N engines to use concurrently (default = 1) --exposeDMA Serialize DMA transfers to and from device. (default = disabled) --useSpinWait Actively synchronize on GPU events. This option may decrease synchronization time but increase CPU usage and power (default = disabled) --threads Enable multithreading to drive engines with independent threads (default = disabled) --useCudaGraph Use cuda graph to capture engine execution and then launch inference (default = disabled) --buildOnly Skip inference perf measurement (default = disabled) === Build and Inference Batch Options === When using implicit batch, the max batch size of the engine, if not given, is set to the inference batch size; when using explicit batch, if shapes are specified only for inference, they will be used also as min/opt/max in the build profile; if shapes are specified only for the build, the opt shapes will be used also for inference; if both are specified, they must be compatible; and if explicit batch is enabled but neither is specified, the model must provide complete static dimensions, including batch size, for all inputs === Reporting Options === --verbose Use verbose logging (default = false) --avgRuns=N Report performance measurements averaged over N consecutive iterations (default = 10) --percentile=P Report performance for the P percentage (0<=P<=100, 0 representing max perf, and 100 representing min perf; (default = 99%) --dumpOutput Print the output tensor(s) of the last inference iteration (default = disabled) --dumpProfile Print profile information per layer (default = disabled) --exportTimes=<file> Write the timing results in a json file (default = disabled) --exportOutput=<file> Write the output tensors to a json file (default = disabled) --exportProfile=<file> Write the profile information per layer in a json file (default = disabled) === System Options === --device=N Select cuda device N (default = 0) --useDLACore=N Select DLA core N for layers that support DLA (default = none) --allowGPUFallback When DLA is enabled, allow GPU fallback for unsupported layers (default = disabled) --plugins Plugin library (.so) to load (can be specified multiple times) === Help === --help Print this message 轉換過程: /usr/src/tensorrt/bin/trtexec --uff=/home/model/model.uff --uffInput=input_1,1,3,224,224 --output=softmax/Softmax --saveEngine=/home/model/model.engine --outputIOFormats=fp32:chw --buildOnly --useCudaGraph
          第二種方式
  3. # tensorflow pb轉onnx,生成engine

    第一步安裝tf2onnx:pip install -U tf2onnx 執行:python3 -m tf2onnx.convert --help 查看使用方式 輸出: usage: convert.py [-h] [--input INPUT] [--graphdef GRAPHDEF] [--saved-model SAVED_MODEL] [--tag TAG] [--signature_def SIGNATURE_DEF] [--concrete_function CONCRETE_FUNCTION] [--checkpoint CHECKPOINT] [--keras KERAS] [--large_model] [--output OUTPUT] [--inputs INPUTS] [--outputs OUTPUTS] [--opset OPSET] [--custom-ops CUSTOM_OPS] [--extra_opset EXTRA_OPSET] [--target {rs4,rs5,rs6,caffe2}] [--continue_on_error] [--verbose] [--debug] [--output_frozen_graph OUTPUT_FROZEN_GRAPH] [--fold_const] [--inputs-as-nchw INPUTS_AS_NCHW] Convert tensorflow graphs to ONNX. optional arguments: -h, --help show this help message and exit --input INPUT input from graphdef --graphdef GRAPHDEF input from graphdef --saved-model SAVED_MODEL input from saved model --tag TAG tag to use for saved_model --signature_def SIGNATURE_DEF signature_def from saved_model to use --concrete_function CONCRETE_FUNCTION For TF2.x saved_model, index of func signature in __call__ (--signature_def is ignored) --checkpoint CHECKPOINT input from checkpoint --keras KERAS input from keras model --large_model use the large model format (for models > 2GB) --output OUTPUT output model file --inputs INPUTS model input_names --outputs OUTPUTS model output_names --opset OPSET opset version to use for onnx domain --custom-ops CUSTOM_OPS list of custom ops --extra_opset EXTRA_OPSET extra opset with format like domain:version, e.g. com.microsoft:1 --target {rs4,rs5,rs6,caffe2} target platform --continue_on_error continue_on_error --verbose, -v verbose output, option is additive --debug debug mode --output_frozen_graph OUTPUT_FROZEN_GRAPH output frozen tf graph to file --fold_const Deprecated. Constant folding is always enabled. --inputs-as-nchw INPUTS_AS_NCHW transpose inputs as from nhwc to nchw Usage Examples: python -m tf2onnx.convert --saved-model saved_model_dir --output model.onnx python -m tf2onnx.convert --input frozen_graph.pb --inputs X:0 --outputs output:0 --output model.onnx python -m tf2onnx.convert --checkpoint checkpoint.meta --inputs X:0 --outputs output:0 --output model.onnx For help and additional information see: https://github.com/onnx/tensorflow-onnx If you run into issues, open an issue here: https://github.com/onnx/tensorflow-onnx/issues 轉換成onnx 需要執行conda activate tensorflow python3 -m tf2onnx.convert --input model.pb --inputs input_1:0 --outputs softmax/Softmax:0 --inputs-as-nchw input_1:0 --output model.onnx --opset 13 onnx轉engine(這是動態輸入時的轉換方式,這個時候需要退出 tensorflow的conda環境) /usr/src/tensorrt/bin/trtexec --onnx=/home/model/model.onnx --explicitBatch --minShapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --optShapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --maxShapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --shapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --inputIOFormats=fp32:chw --outputIOFormats=fp32:chw --saveEngine=/home/model/model.engine --buildOnly --useCudaGraph 如果是固定輸入則去除--explicitBatch --minShapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --optShapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --maxShapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --shapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3,然后增加--batch batch_size,這里的batch_size是tensorflow模型的batch_size

    pytorch 模型轉engine

    第一步:參考pytorch模型轉onnx將pytorch模型轉換成onnx模型
    第二步:參考上面Tensorflow模型onnx轉engine步驟轉換即可

    caffe轉engine

    caffe轉engine只需參考tensorflow轉engine的第一種方式的第二步,只不過將--uff改成--model、--deploy的組合即可。

    模型轉換之后服務的運行是依賴轉換過程中使用的硬件環境的,比如模型轉換是在Tesla V100S顯卡、Driver Version: 440.118.02 CUDA Version: 10.2 上轉換的則服務運行的機器也需要保持一致(Driver Version和cuda Version可以更高)。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM