最近在用tensorrt api實現refinedet,中間過程記錄一下。主要參考的如下倉庫:
https://github.com/wang-xinyu/tensorrtx
tensorrt api說明:
https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/index.html
vgg主體部分實現了和pytorch精度一致,然后有個自定義L2norm層,我就傻眼了,翻遍了整個這個倉庫都沒有,tensorrt不熟悉,難啊。
pytorch的L2norm層代碼:
import torch
import torch.nn as nn
from torch.autograd import Function
#from torch.autograd import Variable
import torch.nn.init as init
class L2Norm(nn.Module):
def __init__(self,n_channels, scale):
super(L2Norm,self).__init__()
self.n_channels = n_channels
self.gamma = scale or None
self.eps = 1e-10
self.weight = nn.Parameter(torch.Tensor(self.n_channels))
self.reset_parameters()
a = 0
def reset_parameters(self):
init.constant_(self.weight,self.gamma)
def forward(self, x):
aa = x.pow(2) ## [1,512,40,40]
bb = x.pow(2).sum(dim=1, keepdim=True) ## [1,1,40,40]
bb1 = x.pow(2).sum(dim=1)#[1,40,40]
norm = x.pow(2).sum(dim=1, keepdim=True).sqrt()+self.eps # [1,1,40,40]
#x /= norm
x = torch.div(x,norm) # [1,512,40,40]
ccc = self.weight.unsqueeze(0).unsqueeze(2).unsqueeze(3).expand_as(x) ##[1,512,40,40]
out = self.weight.unsqueeze(0).unsqueeze(2).unsqueeze(3).expand_as(x) * x
return out
用的時候:
self.conv4_3_L2Norm = L2Norm(512, 10)
self.conv5_3_L2Norm = L2Norm(512, 8)
仔細研究了這段代碼,這句:
self.weight = nn.Parameter(torch.Tensor(self.n_channels))
這句表明了weight是可學習參數。一開始還不知道,因為初始化的時候初始了常亮10,8.當我把權重載入的時候發現權重變化了,變成了9點多,7點多。我就意識到這些參數是可學習的了。
但是tensorrt咋搞哦,群里問,有個群名叫“昝”的大佬說加減乘除的運算可以用addScale,addElementWise這些完成,然后又給我說官方也有api實現了:
https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/_nv_infer_plugin_8h.html#a23fc3c32fb290af2b0f6f3922b1e7527
我就去看了,根本不知道怎么用,都沒有個例子的,一頭霧水,再稍微問點,群里大佬就是自己看文檔。也是也是,一切都得靠自己!
然后我看項目里面不支持的層自己寫plugin來實現的,然后我又去看plugin,然后tensorrt官方git有plugin的實現norm,
https://github.com/NVIDIA/TensorRT/tree/master/plugin/normalizePlugin
這個文件夾下面實現了很多plugin包括nmsPlugin,priorBoxPlugin,batchedNMSPlugin這些感覺都是目標檢測后處理用到的,然后看了半天,不知道怎么用啊!!!!
好像這個需要源碼編譯tensorrt庫,cmakelist編譯,我試了,沒有成功報錯。
我太難了。
然后又看cuda編程的實現方式,全是操作一維數組嗎,按照某個維度求和都需要好多代碼才能完成啊。。。然后又去看了一天cuda編程,把第五章看完了,講到了share共享內存這里。
第三天,一早我就盯着L2Norm的實現方式,
aa = x.pow(2) ## [1,512,40,40]
bb = x.pow(2).sum(dim=1, keepdim=True) ## [1,1,40,40]
# bb1 = x.pow(2).sum(dim=1)#[1,40,40]
norm = x.pow(2).sum(dim=1, keepdim=True).sqrt()+self.eps # [1,1,40,40]
其實一開始為了方便了解過程我就把任務拆解了,所以思路就是先實現求平方。其實我知道
virtual IScaleLayer* addScale(ITensor& input, ScaleMode mode, Weights shift, Weights scale, Weights power) TRTNOEXCEPT = 0;這個函數完成的數學操作就是:
f(x)= (shift + scale * x) ^ power
求平方就是pow賦值2,scale=1,shift=0.就可以完成。恩!然后試了下。確實可以,和pytorch代碼一致,然后再下一步:
x.pow(2).sum(dim=1, keepdim=True)
這個就不知道咋搞了,我去群里問,指定哪個維度求和怎么弄,有個群名叫sky的說addReduce,真感謝這位兄弟,確實是的。
virtual IReduceLayer* addReduce(ITensor& input, ReduceOperation operation, uint32_t reduceAxes, bool keepDimensions) TRTNOEXCEPT = 0;
然后我就去工程中搜索addReduce,確實有個工程用到了這個api
仔細一看代碼也是和我需要實現的功能差不多:
IScaleLayer* addInstanceNorm2d(INetworkDefinition *network, std::map<std::string, Weights>& weightMap, ITensor& input, const std::string lname, const float eps) {
int len = weightMap[lname + ".weight"].count;
IReduceLayer* reduce1 = network->addReduce(input,
ReduceOperation::kAVG,
6,
true);
assert(reduce1);
IElementWiseLayer* ew1 = network->addElementWise(input,
*reduce1->getOutput(0),
ElementWiseOperation::kSUB);
assert(ew1);
const static float pval1[3]{0.0, 1.0, 2.0};
Weights wshift1{DataType::kFLOAT, pval1, 1};
Weights wscale1{DataType::kFLOAT, pval1+1, 1};
Weights wpower1{DataType::kFLOAT, pval1+2, 1};
IScaleLayer* scale1 = network->addScale(
*ew1->getOutput(0),
ScaleMode::kUNIFORM,
wshift1,
wscale1,
wpower1);
assert(scale1);
IReduceLayer* reduce2 = network->addReduce(
*scale1->getOutput(0),
ReduceOperation::kAVG,
6,
true);
assert(reduce2);
const static float pval2[3]{eps, 1.0, 0.5};
Weights wshift2{DataType::kFLOAT, pval2, 1};
Weights wscale2{DataType::kFLOAT, pval2+1, 1};
Weights wpower2{DataType::kFLOAT, pval2+2, 1};
IScaleLayer* scale2 = network->addScale(
*reduce2->getOutput(0),
ScaleMode::kUNIFORM,
wshift2,
wscale2,
wpower2);
assert(scale2);
IElementWiseLayer* ew2 = network->addElementWise(*ew1->getOutput(0),
*scale2->getOutput(0),
ElementWiseOperation::kDIV);
assert(ew2);
float* pval3 = reinterpret_cast<float*>(malloc(sizeof(float) * len));
std::fill_n(pval3, len, 1.0);
Weights wpower3{DataType::kFLOAT, pval3, len};
weightMap[lname + ".power3"] = wpower3;
IScaleLayer* scale3 = network->addScale(
*ew2->getOutput(0),
ScaleMode::kCHANNEL,
weightMap[lname + ".bias"],
weightMap[lname + ".weight"],
wpower3);
assert(scale3);
return scale3;
}
恩!我最擅長的就是依葫蘆畫瓢。很快,我把L2Norm也實現完了,並且精度一致。
IScaleLayer* L2norm(INetworkDefinition *network, std::map<std::string, Weights>& weightMap, ITensor& input, const std::string pre_name = "conv4_3_L2Norm.weight")
{
//aa = x.pow(2) ## [1,512,40,40]
const static float pval1[3]{0.0, 1.0, 2.0};
Weights wshift1{DataType::kFLOAT, pval1, 1};
Weights wscale1{DataType::kFLOAT, pval1+1, 1};
Weights wpower1{DataType::kFLOAT, pval1+2, 1};
IScaleLayer* scale1 = network->addScale(
input,
ScaleMode::kUNIFORM,
wshift1,
wscale1,
wpower1);
assert(scale1);
//bb = x.pow(2).sum(dim=1, keepdim=True) ## [1,1,40,40]
IReduceLayer* reduce1 = network->addReduce(*scale1->getOutput(0),
ReduceOperation::kSUM,
1,
true);
assert(reduce1);
//norm = x.pow(2).sum(dim=1, keepdim=True).sqrt()+self.eps # [1,1,40,40]
const static float pval2[3]{0.0, 1.0, 0.5};
Weights wshift2{DataType::kFLOAT, pval2, 1};
Weights wscale2{DataType::kFLOAT, pval2+1, 1};
Weights wpower2{DataType::kFLOAT, pval2+2, 1};
IScaleLayer* scale2 = network->addScale(
*reduce1->getOutput(0),
ScaleMode::kUNIFORM,
wshift2,
wscale2,
wpower2);
assert(scale2);
// x = torch.div(x,norm)
IElementWiseLayer* ew2 = network->addElementWise(input,
*scale2->getOutput(0),
ElementWiseOperation::kDIV);
assert(ew2);
//out = self.weight.unsqueeze(0).unsqueeze(2).unsqueeze(3).expand_as(x) * x
int len = weightMap[pre_name].count;
float* pval3 = reinterpret_cast<float*>(malloc(sizeof(float) * len));
std::fill_n(pval3, len, 1.0);
Weights wpower3{DataType::kFLOAT, pval3, len};
weightMap[pre_name + ".power3"] = wpower3;
float* pval4 = reinterpret_cast<float*>(malloc(sizeof(float) * len));
std::fill_n(pval4, len, 0.0);
Weights wpower4{DataType::kFLOAT, pval4, len};
weightMap[pre_name + ".power4"] = wpower4;
IScaleLayer* scale3 = network->addScale(
*ew2->getOutput(0),
ScaleMode::kCHANNEL,
wpower4,
weightMap[pre_name],
wpower3);
assert(scale3);
return scale3;
}
然后再接着搭建后面的網絡。
ps基金跌成狗了 <>