一、簡要使用流程

paddle inference的使用較為簡單，其基本代碼如下：

// 創建predictor
std::shared_ptr<Predictor> InitPredictor() {
  Config config;
  if (FLAGS_model_dir != "") {
    config.SetModel(FLAGS_model_dir);
  }
  config.SetModel(FLAGS_model_file, FLAGS_params_file);
  if (FLAGS_use_gpu) {
    config.EnableUseGpu(100, 0);
  } else {
    config.EnableMKLDNN();
  }

  // Open the memory optim.
  config.EnableMemoryOptim();
  return CreatePredictor(config);
}

// 執行預測
void run(Predictor *predictor, const std::vector<float> &input,
         const std::vector<int> &input_shape, std::vector<float> *out_data) {
  int input_num = std::accumulate(input_shape.begin(), input_shape.end(), 1,
                                  std::multiplies<int>());

  auto input_names = predictor->GetInputNames();
  auto output_names = predictor->GetOutputNames();
  auto input_t = predictor->GetInputHandle(input_names[0]);
  input_t->Reshape(input_shape);
  input_t->CopyFromCpu(input.data());

  for (size_t i = 0; i < FLAGS_warmup; ++i)
    CHECK(predictor->Run());

  auto st = time();
  for (size_t i = 0; i < FLAGS_repeats; ++i) {
    CHECK(predictor->Run());
    auto output_t = predictor->GetOutputHandle(output_names[0]);
    std::vector<int> output_shape = output_t->shape();
    int out_num = std::accumulate(output_shape.begin(), output_shape.end(), 1,
                                  std::multiplies<int>());
    out_data->resize(out_num);
    output_t->CopyToCpu(out_data->data());
  }
  LOG(INFO) << "run avg time is " << time_diff(st, time()) / FLAGS_repeats
            << " ms";
}

二、代碼目錄結構

代碼庫地址：https://github.com/PaddlePaddle/Paddle

目錄結構如下：

--cmake #cmake編譯腳本以及編譯鏈接的第三方庫等
--doc
--paddle #c++代碼
    -fluid
        -distributed #分布式相關代碼，主要為訓練使用，包括模型內all_reduce進行跨卡通信、跨機通信等
        -extension #
        -framework #基礎組件代碼
        -imperative #分布式通信相關代碼，包括nccl、all_reduce、bkcl等
        -inference #預測相關代碼以及api定義
        -memory
        -operators #算子
        -platform #平台相關代碼
        -pybind #pybind接口定義
        -string
    -scripts
    -testing
    -utils
--patches
--python #python部分代碼
--r
--tools
--CMakeLists.txt #編譯腳本，包括大部分編譯參數、三方庫依賴等邏輯

三、編譯產出

　　產出目錄如下：

build
    -python #whl安裝包
    -paddle_install_dir #產出的所有頭文件及庫
    -paddle_inference_install_dir #預測c++依賴庫
    -paddle_inference_c_install_dir #預測c依賴庫

四、構件簡介

1、predictor

實例持有所有資源，是預測端的主要端口。在第一個predictor創建出來后，可以使用clone接口創建新的實例，使用clone接口創建出來的predictor實例與父實例共有固定參數資源scope。

會預先進行內存和顯存的分配。

2、scope

用於保存參數變量的結構。其內部結構為：scope->variable->LodTensor。scope中保存variable的map表。predictor會持有scope並保存模型權重。同時，每個predictor實例會保存sub_scope（使用scope創建的子scope）用於保存臨時參數（計算過程中的可變參數、輸入輸出等）。

在第一次執行創建好內存后，后續的執行都是從scope中獲取緩存的內存來使用。

在inference中scope由於只會有單線程寫，多線程讀的部分為持久參數，因此無鎖。但是在訓練框架中scope有鎖。

3、place

用於表征當前操作運行環境的變量，如：CPUPlace、GPUPlace等。程序使用place選擇對應opkernel，或者表征當前內存所在的位置為cpu還是gpu等

4、DeviceContext與DeviceContextPool

DeviceContext用於存儲當前環境中所有計算資源，每種硬件資源都有對應的context，包括cpu、gpu、npu等。predictor從全局單例的DeviceContextPool中根據place獲取對應DeviceContext

5、Operator與Kernel

算子，計算單元。模型文件中保存了所要運行的有向圖，圖中每個節點就是一個Op，predictor執行時會順序執行圖中的每個OP。這里OP有兩種，一種是功能性OP，直接在OP中寫好了操作內容，如打印錯誤信息、收集性能數據等；另一種是計算OP，基於OpWithKernel實現，調用該種OP時，會根據place參數以及op本身特性選擇對應器件的Kernel執行計算。一般每個OP都會注冊包括CPU、GPU、NPU等版本的對應Kernel。

6、IR Pass

圖優化。加載模型文件，創建predictor后，如果開啟ir_optim會進行圖優化。這里實際是對原始的模型op圖順序執行符合要求的各個Pass，每個pass代表一種優化規則，比如有節點融合、特殊結構優化、子圖切割進tensorrt等等。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 PaddlePaddle inference 源碼分析（四） PaddlePaddle inference 源碼分析（二） PaddlePaddle Transformer encoder 源碼解析 fluid.io.load_inference_model 載入多個模型的時候會報錯 -- [paddlepaddle] tensorflow源碼解析之framework-shape_inference PaddlePaddle paddlepaddle中文詞法分析LAC inference和learning paddlepaddle使用(一) PaddlePaddle tutorial