LibTorch實戰六：C++版本YOLOV5.4的部署<一>

本文轉載自查看原文 2021-04-09 17:03 671 深度學習模型部署

一、環境配置
二、.torchscript.pt版本模型導出
三、C++版本yolov5.4實現
四、問題記錄

一、環境配置

win10
vs2017
libtorch-win-shared-with-deps-debug-1.8.1+cpu
opencv349

　　由於yolov5代碼，作者還在更新(寫這篇博客的時候，最新是5.4)，模型結構可能會有改變，所以咱們使用的libtorch必須滿足其要求，最好是一致。我這里提供本博客采用的yolov5版本python源碼。

百度雲網盤分享：

1 鏈接：https://pan.baidu.com/s/1VVns4hzJdDN0hFNtSnUZ2w 
2 提取碼：6c1p 
3 復制這段內容后打開百度網盤手機App，操作更方便哦

View Code

　　在源碼中的requirments.txt中要求依賴庫版本如下；在c++環境中，咱們這里用的libtorch1.8.1（今天我也測試了環境：libtorch-win-shared-with-deps-1.7.1+cu110，也能夠正常檢測，和本博客最終結果一致）；同時用opencv&c++作圖像處理，不需要c++版本torchvision：

 1 # pip install -r requirements.txt
 2 
 3 # base ----------------------------------------
 4 matplotlib>=3.2.2
 5 numpy>=1.18.5
 6 opencv-python>=4.1.2
 7 Pillow
 8 PyYAML>=5.3.1
 9 scipy>=1.4.1
10 torch>=1.7.0
11 torchvision>=0.8.1
# 以下內容神略

　　為了便於調試，我這里下載的是debug版本libtorch，而且是cpu版本，代碼調好后，轉GPU也很簡單吧。opencv版本其實隨意，opencv3++就行。

二、.torchscript.pt版本模型導出

　　打開yolov5.4源碼目錄下models文件夾，編輯export.py腳本，如下，將58行注釋，新增59行（GPU版本還需要修改一些內容，GPU版本后續更新，這篇博客只管CPU版本）

 1 """Exports a YOLOv5 *.pt model to ONNX and TorchScript formats
 2 
 3 Usage:
 4     $ export PYTHONPATH="$PWD" && python models/export.py --weights ./weights/yolov5s.pt --img 640 --batch 1
 5 """
 6 
 7 import argparse
 8 import sys
 9 import time
10 
11 sys.path.append('./')  # to run '$ python *.py' files in subdirectories
12 
13 import torch
14 import torch.nn as nn
15 
16 import models
17 from models.experimental import attempt_load
18 from utils.activations import Hardswish, SiLU
19 from utils.general import set_logging, check_img_size
20 from utils.torch_utils import select_device
21 
22 if __name__ == '__main__':
23     parser = argparse.ArgumentParser()
24     parser.add_argument('--weights', type=str, default='./yolov5s.pt', help='weights path')  # from yolov5/models/
25     parser.add_argument('--img-size', nargs='+', type=int, default=[640, 640], help='image size')  # height, width
26     parser.add_argument('--batch-size', type=int, default=1, help='batch size')
27     parser.add_argument('--dynamic', action='store_true', help='dynamic ONNX axes')
28     parser.add_argument('--grid', action='store_true', help='export Detect() layer grid')
29     parser.add_argument('--device', default='cpu', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
30     opt = parser.parse_args()
31     opt.img_size *= 2 if len(opt.img_size) == 1 else 1  # expand
32     print(opt)
33     set_logging()
34     t = time.time()
35 
36     # Load PyTorch model
37     device = select_device(opt.device)
38     model = attempt_load(opt.weights, map_location=device)  # load FP32 model
39     labels = model.names
40 
41     # Checks
42     gs = int(max(model.stride))  # grid size (max stride)
43     opt.img_size = [check_img_size(x, gs) for x in opt.img_size]  # verify img_size are gs-multiples
44 
45     # Input
46     img = torch.zeros(opt.batch_size, 3, *opt.img_size).to(device)  # image size(1,3,320,192) iDetection
47 
48     # Update model
49     for k, m in model.named_modules():
50         m._non_persistent_buffers_set = set()  # pytorch 1.6.0 compatibility
51         if isinstance(m, models.common.Conv):  # assign export-friendly activations
52             if isinstance(m.act, nn.Hardswish):
53                 m.act = Hardswish()
54             elif isinstance(m.act, nn.SiLU):
55                 m.act = SiLU()
56         # elif isinstance(m, models.yolo.Detect):
57         #     m.forward = m.forward_export  # assign forward (optional)
58     #model.model[-1].export = not opt.grid  # set Detect() layer grid export
59     model.model[-1].export = False
60     y = model(img)  # dry run
61 
62     # TorchScript export
63     try:
64         print('\nStarting TorchScript export with torch %s...' % torch.__version__)
65         f = opt.weights.replace('.pt', '.torchscript.pt')  # filename
66         ts = torch.jit.trace(model, img)
67         ts.save(f)
68         print('TorchScript export success, saved as %s' % f)
69     except Exception as e:
70         print('TorchScript export failure: %s' % e)
71 # 以下代碼省略，無需求改
72 ......

　　接着在conda環境激活yolov5.4的虛擬環境，執行下面腳本：

(提示：如何配置yolov5.4環境？參考我這篇Win10環境下YOLO5 快速配置與測試：https://www.cnblogs.com/winslam/p/13474330.html)

python models/export.py --weights ./weights/yolov5s.pt --img 640 --batch 1

　　錯誤解決：1、bash窗口可能提示 not module utils；這是因為沒有將源碼根目錄添加進環境變量，linux下執行以下命令就行

export PYTHONPATH="$PWD"

　　win下，我建議直接用pycharm打開yolov5.4工程，在ide中去執行export.py就行，如果你沒有下載好yolovs.pt，他會自動下載，下載鏈接會打印在控制台，如下，如果下不動，可以嘗試復制鏈接到迅雷

Downloading https://github.com/ultralytics/yolov5/releases/download/v4.0/yolov5s.pt to yolov5s.pt...

執行export.py后出現如下警告：

1 D:\yolov5-0327\models\yolo.py:50: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
2   if self.grid[i].shape[2:4] != x[i].shape[2:4]:
3 D:\Program Files\Anaconda\envs\yolov5\lib\site-packages\torch\jit\_trace.py:934: TracerWarning: Encountering a list at the output of the tracer might cause the trace to be incorrect, this is only valid if the container structure does not change based on the module's inputs. Consider using a constant container instead (e.g. for `list`, use a `tuple` instead. for `dict`, use a `NamedTuple` instead). If you absolutely need this and know the side effects, pass strict=False to trace() to allow this behavior.
4   module._c._create_method_from_trace(
5 TorchScript export success, saved as ./yolov5s.torchscript.pt
6 ONNX export failure: No module named 'onnx'
7 CoreML export failure: No module named 'coremltools'
8 
9 Export complete (10.94s). Visualize with https://github.com/lutzroeder/netron.

警告內容以后分析，不影響部署

三、C++版本yolov5.4實現

libtorch在vs環境中配置(在項目屬性中設置下面加粗項目)：

include：

D:\libtorch-win-shared-with-deps-debug-1.8.1+cpu\libtorch\include

D:\libtorch-win-shared-with-deps-debug-1.8.1+cpu\libtorch\include\torch\csrc\api\include

lib：

D:\libtorch-win-shared-with-deps-debug-1.8.1+cpu\libtorch\lib

依賴庫（你可能用的更新的libtorch，所以具體lib目錄下所有.lib文件都要自己貼到連接器-附加依賴中）：

asmjit.lib
c10.lib
c10d.lib
c10_cuda.lib
caffe2_detectron_ops_gpu.lib
caffe2_module_test_dynamic.lib
caffe2_nvrtc.lib
clog.lib
cpuinfo.lib
dnnl.lib
fbgemm.lib
fbjni.lib
gloo.lib
gloo_cuda.lib
libprotobuf-lite.lib
libprotobuf.lib
libprotoc.lib
mkldnn.lib
pthreadpool.lib
pytorch_jni.lib
torch.lib
torch_cpu.lib
torch_cuda.lib
XNNPACK.lib

環境變量（需要重啟）：

D:\libtorch-win-shared-with-deps-debug-1.8.1+cpu\libtorch\lib

配置好之后，vs2017 設置為debug X64模式，下面是yolov5.4版本c++代碼

輸入是：

上述轉好的.torchscript.pt格式的模型文件
coco.names
一張圖

  1 #include <torch/script.h>
  2 #include <torch/torch.h>
  3 #include<opencv2/opencv.hpp>
  4 #include <iostream>
  5 
  6 
  7 std::vector<std::string> LoadNames(const std::string& path) 
  8 {
  9     // load class names
 10     std::vector<std::string> class_names;
 11     std::ifstream infile(path);
 12     if (infile.is_open()) {
 13         std::string line;
 14         while (std::getline(infile, line)) {
 15             class_names.emplace_back(line);
 16         }
 17         infile.close();
 18     }
 19     else {
 20         std::cerr << "Error loading the class names!\n";
 21     }
 22 
 23     return class_names;
 24 }
 25 
 26 std::vector<float> LetterboxImage(const cv::Mat& src, cv::Mat& dst, const cv::Size& out_size)
 27 {
 28     auto in_h = static_cast<float>(src.rows);
 29     auto in_w = static_cast<float>(src.cols);
 30     float out_h = out_size.height;
 31     float out_w = out_size.width;
 32 
 33     float scale = std::min(out_w / in_w, out_h / in_h);
 34 
 35     int mid_h = static_cast<int>(in_h * scale);
 36     int mid_w = static_cast<int>(in_w * scale);
 37 
 38     cv::resize(src, dst, cv::Size(mid_w, mid_h));
 39 
 40     int top = (static_cast<int>(out_h) - mid_h) / 2;
 41     int down = (static_cast<int>(out_h) - mid_h + 1) / 2;
 42     int left = (static_cast<int>(out_w) - mid_w) / 2;
 43     int right = (static_cast<int>(out_w) - mid_w + 1) / 2;
 44 
 45     cv::copyMakeBorder(dst, dst, top, down, left, right, cv::BORDER_CONSTANT, cv::Scalar(114, 114, 114));
 46 
 47     std::vector<float> pad_info{ static_cast<float>(left), static_cast<float>(top), scale };
 48     return pad_info;
 49 }
 50 
 51 enum Det 
 52 {
 53     tl_x = 0,
 54     tl_y = 1,
 55     br_x = 2,
 56     br_y = 3,
 57     score = 4,
 58     class_idx = 5
 59 };
 60 
 61 struct Detection 
 62 {
 63     cv::Rect bbox;
 64     float score;
 65     int class_idx;
 66 };
 67 
 68 void Tensor2Detection(const at::TensorAccessor<float, 2>& offset_boxes,
 69     const at::TensorAccessor<float, 2>& det,
 70     std::vector<cv::Rect>& offset_box_vec,
 71     std::vector<float>& score_vec)
 72 {
 73 
 74     for (int i = 0; i < offset_boxes.size(0); i++) {
 75         offset_box_vec.emplace_back(
 76             cv::Rect(cv::Point(offset_boxes[i][Det::tl_x], offset_boxes[i][Det::tl_y]),
 77                 cv::Point(offset_boxes[i][Det::br_x], offset_boxes[i][Det::br_y]))
 78         );
 79         score_vec.emplace_back(det[i][Det::score]);
 80     }
 81 }
 82 
 83 void ScaleCoordinates(std::vector<Detection>& data, float pad_w, float pad_h,
 84     float scale, const cv::Size& img_shape)
 85 {
 86     auto clip = [](float n, float lower, float upper)
 87     {
 88         return std::max(lower, std::min(n, upper));
 89     };
 90 
 91     std::vector<Detection> detections;
 92     for (auto & i : data) {
 93         float x1 = (i.bbox.tl().x - pad_w) / scale;  // x padding
 94         float y1 = (i.bbox.tl().y - pad_h) / scale;  // y padding
 95         float x2 = (i.bbox.br().x - pad_w) / scale;  // x padding
 96         float y2 = (i.bbox.br().y - pad_h) / scale;  // y padding
 97 
 98         x1 = clip(x1, 0, img_shape.width);
 99         y1 = clip(y1, 0, img_shape.height);
100         x2 = clip(x2, 0, img_shape.width);
101         y2 = clip(y2, 0, img_shape.height);
102 
103         i.bbox = cv::Rect(cv::Point(x1, y1), cv::Point(x2, y2));
104     }
105 }
106 
107 
108 torch::Tensor xywh2xyxy(const torch::Tensor& x)
109 {
110     auto y = torch::zeros_like(x);
111     // convert bounding box format from (center x, center y, width, height) to (x1, y1, x2, y2)
112     y.select(1, Det::tl_x) = x.select(1, 0) - x.select(1, 2).div(2);
113     y.select(1, Det::tl_y) = x.select(1, 1) - x.select(1, 3).div(2);
114     y.select(1, Det::br_x) = x.select(1, 0) + x.select(1, 2).div(2);
115     y.select(1, Det::br_y) = x.select(1, 1) + x.select(1, 3).div(2);
116     return y;
117 }
118 
119 std::vector<std::vector<Detection>> PostProcessing(const torch::Tensor& detections,
120     float pad_w, float pad_h, float scale, const cv::Size& img_shape,
121     float conf_thres, float iou_thres)
122 {
123     /***
124      * 結果緯度為batch index(0), top-left x/y (1,2), bottom-right x/y (3,4), score(5), class id(6)
125      * 13*13*3*(1+4)*80
126      */
127     constexpr int item_attr_size = 5;
128     int batch_size = detections.size(0);
129     // number of classes, e.g. 80 for coco dataset
130     auto num_classes = detections.size(2) - item_attr_size;
131 
132     // get candidates which object confidence > threshold
133     auto conf_mask = detections.select(2, 4).ge(conf_thres).unsqueeze(2);
134 
135     std::vector<std::vector<Detection>> output;
136     output.reserve(batch_size);
137 
138     // iterating all images in the batch
139     for (int batch_i = 0; batch_i < batch_size; batch_i++) {
140         // apply constrains to get filtered detections for current image
141         auto det = torch::masked_select(detections[batch_i], conf_mask[batch_i]).view({ -1, num_classes + item_attr_size });
142 
143         // if none detections remain then skip and start to process next image
144         if (0 == det.size(0)) {
145             continue;
146         }
147 
148         // compute overall score = obj_conf * cls_conf, similar to x[:, 5:] *= x[:, 4:5]
149         det.slice(1, item_attr_size, item_attr_size + num_classes) *= det.select(1, 4).unsqueeze(1);
150 
151         // box (center x, center y, width, height) to (x1, y1, x2, y2)
152         torch::Tensor box = xywh2xyxy(det.slice(1, 0, 4));
153 
154         // [best class only] get the max classes score at each result (e.g. elements 5-84)
155         std::tuple<torch::Tensor, torch::Tensor> max_classes = torch::max(det.slice(1, item_attr_size, item_attr_size + num_classes), 1);
156 
157         // class score
158         auto max_conf_score = std::get<0>(max_classes);
159         // index
160         auto max_conf_index = std::get<1>(max_classes);
161 
162         max_conf_score = max_conf_score.to(torch::kFloat).unsqueeze(1);
163         max_conf_index = max_conf_index.to(torch::kFloat).unsqueeze(1);
164 
165         // shape: n * 6, top-left x/y (0,1), bottom-right x/y (2,3), score(4), class index(5)
166         det = torch::cat({ box.slice(1, 0, 4), max_conf_score, max_conf_index }, 1);
167 
168         // for batched NMS
169         constexpr int max_wh = 4096;
170         auto c = det.slice(1, item_attr_size, item_attr_size + 1) * max_wh;
171         auto offset_box = det.slice(1, 0, 4) + c;
172 
173         std::vector<cv::Rect> offset_box_vec;
174         std::vector<float> score_vec;
175 
176         // copy data back to cpu
177         auto offset_boxes_cpu = offset_box.cpu();
178         auto det_cpu = det.cpu();
179         const auto& det_cpu_array = det_cpu.accessor<float, 2>();
180 
181         // use accessor to access tensor elements efficiently
182         Tensor2Detection(offset_boxes_cpu.accessor<float, 2>(), det_cpu_array, offset_box_vec, score_vec);
183 
184         // run NMS
185         std::vector<int> nms_indices;
186         cv::dnn::NMSBoxes(offset_box_vec, score_vec, conf_thres, iou_thres, nms_indices);
187 
188         std::vector<Detection> det_vec;
189         for (int index : nms_indices) {
190             Detection t;
191             const auto& b = det_cpu_array[index];
192             t.bbox =
193                 cv::Rect(cv::Point(b[Det::tl_x], b[Det::tl_y]),
194                     cv::Point(b[Det::br_x], b[Det::br_y]));
195             t.score = det_cpu_array[index][Det::score];
196             t.class_idx = det_cpu_array[index][Det::class_idx];
197             det_vec.emplace_back(t);
198         }
199 
200         ScaleCoordinates(det_vec, pad_w, pad_h, scale, img_shape);
201 
202         // save final detection for the current image
203         output.emplace_back(det_vec);
204     } // end of batch iterating
205 
206     return output;
207 }
208 
209 void Demo(cv::Mat& img,
210     const std::vector<std::vector<Detection>>& detections,
211     const std::vector<std::string>& class_names,
212     bool label = true)
213 {
214     if (!detections.empty()) {
215         for (const auto& detection : detections[0]) {
216             const auto& box = detection.bbox;
217             float score = detection.score;
218             int class_idx = detection.class_idx;
219 
220             cv::rectangle(img, box, cv::Scalar(0, 0, 255), 2);
221 
222             if (label) {
223                 std::stringstream ss;
224                 ss << std::fixed << std::setprecision(2) << score;
225                 std::string s = class_names[class_idx] + " " + ss.str();
226 
227                 auto font_face = cv::FONT_HERSHEY_DUPLEX;
228                 auto font_scale = 1.0;
229                 int thickness = 1;
230                 int baseline = 0;
231                 auto s_size = cv::getTextSize(s, font_face, font_scale, thickness, &baseline);
232                 cv::rectangle(img,
233                     cv::Point(box.tl().x, box.tl().y - s_size.height - 5),
234                     cv::Point(box.tl().x + s_size.width, box.tl().y),
235                     cv::Scalar(0, 0, 255), -1);
236                 cv::putText(img, s, cv::Point(box.tl().x, box.tl().y - 5),
237                     font_face, font_scale, cv::Scalar(255, 255, 255), thickness);
238             }
239         }
240     }
241 
242     cv::namedWindow("Result", cv::WINDOW_NORMAL);
243     cv::imshow("Result", img);
244 
245 }
246 
247 int main()
248 {
249     // yolov5Ns.torchscript.pt 報錯，所以僅能讀取yolov5.4模型
250     torch::jit::script::Module module = torch::jit::load("yolov5sxxx.torchscript.pt");
251     torch::DeviceType device_type = torch::kCPU;
252     module.to(device_type);
253     /*module.to(torch::kHalf);*/
254     module.eval();
255 
256     // img 必須讀取3-channels圖片
257     cv::Mat img = cv::imread("zidane.jpg", -1);
258     // 讀取類別
259     std::vector<std::string> class_names = LoadNames("coco.names");
260     if (class_names.empty()) {
261         return -1;
262     }
263 
264     // set up threshold
265     float conf_thres = 0.4;
266     float iou_thres = 0.5;
267 
268     //inference
269     torch::NoGradGuard no_grad;
270     cv::Mat img_input = img.clone();
271     std::vector<float> pad_info = LetterboxImage(img_input, img_input, cv::Size(640, 640));
272     const float pad_w = pad_info[0];
273     const float pad_h = pad_info[1];
274     const float scale = pad_info[2];
275     cv::cvtColor(img_input, img_input, cv::COLOR_BGR2RGB);  // BGR -> RGB
276     //歸一化需要是浮點類型
277     img_input.convertTo(img_input, CV_32FC3, 1.0f / 255.0f);  // normalization 1/255
278     // 加載圖像到設備
279     auto tensor_img = torch::from_blob(img_input.data, { 1, img_input.rows, img_input.cols, img_input.channels() }).to(device_type);
280     // BHWC -> BCHW
281     tensor_img = tensor_img.permute({ 0, 3, 1, 2 }).contiguous();  // BHWC -> BCHW (Batch, Channel, Height, Width)
282     
283     std::vector<torch::jit::IValue> inputs;
284     // 在容器尾部添加一個元素，這個元素原地構造，不需要觸發拷貝構造和轉移構造
285     inputs.emplace_back(tensor_img);
286     
287     torch::jit::IValue output = module.forward(inputs);
288     
289     // 解析結果
290     auto detections = output.toTuple()->elements()[0].toTensor();
291     auto result = PostProcessing(detections, pad_w, pad_h, scale, img.size(), conf_thres, iou_thres);
292     // visualize detections
293     if (true) {
294         Demo(img, result, class_names);
295         cv::waitKey(0);
296     }
297     return 1;
298 }

View Code

四、問題記錄

我參考的是鏈接[1][2]代碼，非常坑，[1][2]代碼是一樣的，也不知道誰抄誰的，代碼中沒有說明yolov5具體版本，而且有很多問題，不過還是感謝給了參考。

原版代碼：

鏈接：https://pan.baidu.com/s/1KFJZV3KxAoXUcN2UKiT2gg 
提取碼：r5c9 
復制這段內容后打開百度網盤手機App，操作更方便哦

View Code

整理后的代碼：

鏈接：https://pan.baidu.com/s/1SvN6cEniUwKJ8_MH-EwAPw 
提取碼：br7i 
復制這段內容后打開百度網盤手機App，操作更方便哦

View Code

在原版代碼整理之后，再將其改為第三節中的cpp，，第三節中的cpp相對原版libtorch實現，我做了如下修改(改了一些錯誤)，參考了資料[3]：

1、注釋 detector.h中，注釋如下頭文件
//#include <c10/cuda/CUDAStream.h>
#//include <ATen/cuda/CUDAEvent.h>

2、錯誤： “std”: 不明確的符號

解決辦法1：項目->屬性->c/c++->語言->符合模式->選擇否

(看清楚vs項目屬性窗口對應的到底是Debug還是Release，血的教訓！）

解決辦法2：還有有個老哥給出的方法是，在std報錯的地方改為："::std"，不推薦！

3、建議常被debug版本libtorch

libtorch中，執行到加載模型那一行代碼，跳進libtorch庫中的Assert，提示錯誤：AT_ASSERT(isTuple(), "Expected Tuple but got ", tagKind());（咱們是libtorch debug版本，還能跳到這一行，要是release，你都不知道錯在哪里，所以常備debug版本，很有必要）

可能是你轉模型的yolov5版本不是5.4，而是5.3、5.3.1、5.3、5.1；還有可能是你export.py腳本中沒有按照上面設置。

參考：https://blog.csdn.net/weixin_42398658/article/details/111954760

4、問題：編譯成功后，運行代碼，發現torch::cuda::is_available()返回false

解決：a、配置環境的時候，請將庫lib文件夾下所有“.lib”文件名粘貼到項目屬性(Release)-鏈接器 - 輸入 - 附加依賴項

b、在項目屬性(Release)-鏈接器 - 命令行 - 其他選項貼入下面命令