CNN壓縮：為反向傳播添加mask（caffe代碼修改）

本文轉載自查看原文 2017-06-20 21:38 2854 caffe代碼/ machine learning/ CNN壓縮

神經網絡壓縮的研究近三年十分熱門，筆者查閱到相關的兩篇博客，博主們非常奉獻的提供了源代碼，但是發發現在使用gpu訓練添加mask的網絡上，稍微有些不順，特此再進行詳細說明。

此文是在基於Caffe的CNN剪枝[1]和 Deep Compression閱讀理解及Caffe源碼修改[2] 的基礎上修改的。

mask的結構？

[1]中使用的blob，存儲mask。blob是一塊數據塊，在初始化時，需要為gpu上的數據塊申請一塊空間，故有Addmask()函數。AddMask()是blob.hpp中的blob的成員方法，需要在blob.cpp中實現。使用時將Addmask()添加在innerproduct.cpp和base_conv.cpp中，使得網絡在setuplayer的過程中，為fc層和conv層多開辟一塊存放mask的syncedmemory。blob有一系列需要實現的cpu_data()/mutable_cpu_data()等，初始化中改變mask的值時需要注意使用合理的方式。

InnerProductLayer.cpp

1 void InnerProductLayer<Dtype>::LayerSetUp(const vector<Blob<Dtype>*>& bottom,
2       const vector<Blob<Dtype>*>& top) {
3     ...
4     this->blobs_[0].reset(new Blob<Dtype>(weight_shape));
5     this->blobs_[0]->Addmask();
6     ...}

base_conv.cpp:

1 template <typename Dtype>
2 void BaseConvolutionLayer<Dtype>::LayerSetUp(const vector<Blob<Dtype>*>& bottom,
3       const vector<Blob<Dtype>*>& top) {
4     ...
5     this->blobs_[0].reset(new Blob<Dtype>(weight_shape));
6     this->blobs_[0]->Addmask();
7     ...}

修改blob.hpp和blob.cpp，添加成員mask_和相關的方法，在[1]文章的評論里作者已給出源代碼。

[2]中使用layer結構定義mask，layer是相當於數據的一系列操作，或者說是blob的組合方法。

但是，想要實現在gpu上的操作，數據需要有gpu有關的操作。故此處采用[1]中的方法，將mask_添加到blob class中，實現mask_屬性。

mask的初始化？

在Caffe框架下，網絡的初始化有兩種方式，一種是調用filler，按照模型中定義的初始化方式進行初始化，第二種是從已有的caffemodel或者snapshot中讀取相應參數矩陣進行初始化[1]。

1、filler的方法

在程序開始時，網絡使用net.cpp中的Init()進行初始化，由輸入至輸出，依次調用各個層的layersetup，建立網絡結構。如下所示是caffe中使用xavier方法進行填充的操作。

 1 virtual void Fill(Blob<Dtype>* blob) {
 2     CHECK(blob->count());
 3     int fan_in = blob->count() / blob->num();
 4     int fan_out = blob->count() / blob->channels();
 5     Dtype n = fan_in;  // default to fan_in
 6     if (this->filler_param_.variance_norm() ==
 7         FillerParameter_VarianceNorm_AVERAGE) {
 8       n = (fan_in + fan_out) / Dtype(2);
 9     } else if (this->filler_param_.variance_norm() ==
10         FillerParameter_VarianceNorm_FAN_OUT) {
11       n = fan_out;
12     }
13     Dtype scale = sqrt(Dtype(3) / n);
14     caffe_rng_uniform<Dtype>(blob->count(), -scale, scale,
15         blob->mutable_cpu_data());
16     //Filler<Dtype>:: FillMask(blob);
17     CHECK_EQ(this->filler_param_.sparse(), -1)
18          << "Sparsity not supported by this Filler.";
19   }

filler的作用是，為建立的網絡結構產生隨機初始化值。

即使是從snapshot或caffemodel中讀入數據，也執行隨機填充操作。

2、從snapshot或caffemodel中讀入數據

tools/caffe.cpp 中的phase:train可以從snapshot或caffemodel中提取參數，進行finetune。phase:test則可以從提取的參數中建立網絡，進行預測過程。

這里筆者的網絡結構是在pycaffe中進行稀疏化的，因此讀入網絡的proto文件是一個連接數不變、存在部分連接權值為零的網絡。需要在讀入參數的同時初始化mask_。因此修改blob.cpp中的fromproto函數：

 1 template <typename Dtype>
 2 void Blob<Dtype>::FromProto(const BlobProto& proto, bool reshape) {
 3   if (reshape) {
 4     vector<int> shape;
 5     if (proto.has_num() || proto.has_channels() ||
 6         proto.has_height() || proto.has_width()) {
 7       // Using deprecated 4D Blob dimensions --
 8       // shape is (num, channels, height, width).
 9       shape.resize(4);
10       shape[0] = proto.num();
11       shape[1] = proto.channels();
12       shape[2] = proto.height();
13       shape[3] = proto.width();
14     } else {
15       shape.resize(proto.shape().dim_size());
16       for (int i = 0; i < proto.shape().dim_size(); ++i) {
17         shape[i] = proto.shape().dim(i);
18       }
19     }
20     Reshape(shape);
21   } else {
22     CHECK(ShapeEquals(proto)) << "shape mismatch (reshape not set)";
23   }
24   // copy data
25   Dtype* data_vec = mutable_cpu_data();
26   if (proto.double_data_size() > 0) {
27     CHECK_EQ(count_, proto.double_data_size());
28     for (int i = 0; i < count_; ++i) {
29       data_vec[i] = proto.double_data(i);
30     }
31   } else {
32     CHECK_EQ(count_, proto.data_size());
33     for (int i = 0; i < count_; ++i) {
34       data_vec[i] = proto.data(i);
35     }
36   }
37   if (proto.double_diff_size() > 0) {
38     CHECK_EQ(count_, proto.double_diff_size());
39     Dtype* diff_vec = mutable_cpu_diff();
40     for (int i = 0; i < count_; ++i) {
41       diff_vec[i] = proto.double_diff(i);
42     }
43   } else if (proto.diff_size() > 0) {
44     CHECK_EQ(count_, proto.diff_size());
45     Dtype* diff_vec = mutable_cpu_diff();
46     for (int i = 0; i < count_; ++i) {
47       diff_vec[i] = proto.diff(i);
48     }
49   }
50   if(shape_.size()==4||shape_.size()==2){
51     Dtype* mask_vec = mutable_cpu_data();
52     CHECK(count_);
53     for(int i=0;i<count_;i++)
54       mask_vec[i]=data_vec[i]?1:0;
55 }

在讀入proto文件的同時，如果層的大小是4D——conv層、或2D——fc層時，初始化mask_為data_vec[i]?1:0。當層的大小是1Ds——pool或relu層時，不進行mask的初始化。

反向傳播的修改？

1、修改blob的更新方式，添加math_funcion.hpp頭文件。

 1 template <typename Dtype>
 2 void Blob<Dtype>::Update() {
 3   // We will perform update based on where the data is located.
 4   switch (data_->head()) {
 5   case SyncedMemory::HEAD_AT_CPU:
 6     // perform computation on CPU
 7     caffe_axpy<Dtype>(count_, Dtype(-1),
 8         static_cast<const Dtype*>(diff_->cpu_data()),
 9         static_cast<Dtype*>(data_->mutable_cpu_data()));
10     caffe_mul<Dtype>(count_,
11       static_cast<const Dtype*>(mask_->cpu_data()),
12       static_cast<const Dtype*>(data_->cpu_data()),
13       static_cast<Dtype*>(data_->mutable_cpu_data()));
14     break;
15   case SyncedMemory::HEAD_AT_GPU:
16   case SyncedMemory::SYNCED:
17 #ifndef CPU_ONLY
18     // perform computation on GPU
19     caffe_gpu_axpy<Dtype>(count_, Dtype(-1),
20         static_cast<const Dtype*>(diff_->gpu_data()),
21         static_cast<Dtype*>(data_->mutable_gpu_data()));
22     caffe_gpu_mul<Dtype>(count_,
23       static_cast<const Dtype*>(mask_->gpu_data()),
24       static_cast<const Dtype*>(data_->gpu_data()),
25       static_cast<Dtype*>(data_->mutable_gpu_data()));
26 #else
27     NO_GPU;
28 #endif
29     break;
30   default:
31     LOG(FATAL) << "Syncedmem not initialized.";
32   }
33 }

2、為cpu下的計算和gpu下的計算分別添加形如weight[i]*=mask[i];的運算方式。

inner_product_layer.cpp:

 1 void InnerProductLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
 2     const vector<bool>& propagate_down,
 3     const vector<Blob<Dtype>*>& bottom) {
 4   if (this->param_propagate_down_[0]) {
 5     const Dtype* top_diff = top[0]->cpu_diff();
 6     const Dtype* bottom_data = bottom[0]->cpu_data();
 7     // Gradient with respect to weight
 8     Dtype* weight_diff = this->blobs_[0]->mutable_cpu_diff();
 9     vector<int> weight_shape(2);
10     if (transpose_) {
11       weight_shape[0] = K_;
12       weight_shape[1] = N_;
13     } else {
14       weight_shape[0] = N_;
15       weight_shape[1] = K_;
16     }
17     int count = weight_shape[0]*weight_shape[1];
18     const Dtype* mask = this->blobs_[0]->cpu_mask();
19     for(int j=0;j<count;j++)
20       weight_diff[j]*=mask[j];
21 
22     if (transpose_) {
23       caffe_cpu_gemm<Dtype>(CblasTrans, CblasNoTrans,
24           K_, N_, M_,
25           (Dtype)1., bottom_data, top_diff,
26           (Dtype)1., weight_diff);
27     } else {
28       caffe_cpu_gemm<Dtype>(CblasTrans, CblasNoTrans,
29           N_, K_, M_,
30           (Dtype)1., top_diff, bottom_data,
31           (Dtype)1., weight_diff);
32     }
33   }
34   if (bias_term_ && this->param_propagate_down_[1]) {
35     const Dtype* top_diff = top[0]->cpu_diff();
36     // Gradient with respect to bias
37     caffe_cpu_gemv<Dtype>(CblasTrans, M_, N_, (Dtype)1., top_diff,
38         bias_multiplier_.cpu_data(), (Dtype)1.,
39         this->blobs_[1]->mutable_cpu_diff());
40   }
41   if (propagate_down[0]) {
42     const Dtype* top_diff = top[0]->cpu_diff();
43     // Gradient with respect to bottom data
44     if (transpose_) {
45       caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasTrans,
46           M_, K_, N_,
47           (Dtype)1., top_diff, this->blobs_[0]->cpu_data(),
48           (Dtype)0., bottom[0]->mutable_cpu_diff());
49     } else {
50       caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans,
51           M_, K_, N_,
52           (Dtype)1., top_diff, this->blobs_[0]->cpu_data(),
53           (Dtype)0., bottom[0]->mutable_cpu_diff());
54     }
55   }
56 }

inner_product_layer.cu:

 1 template <typename Dtype>
 2 void InnerProductLayer<Dtype>::Backward_gpu(const vector<Blob<Dtype>*>& top,
 3     const vector<bool>& propagate_down,
 4     const vector<Blob<Dtype>*>& bottom) {
 5   if (this->param_propagate_down_[0]) {
 6     const Dtype* top_diff = top[0]->gpu_diff();
 7     const Dtype* bottom_data = bottom[0]->gpu_data();
 8     vector<int> weight_shape(2);
 9     if (transpose_) {
10       weight_shape[0] = K_;
11       weight_shape[1] = N_;
12     } else {
13       weight_shape[0] = N_;
14       weight_shape[1] = K_;
15     }
16     int count = weight_shape[0]*weight_shape[1];
17     caffe_gpu_mul<Dtype>(count,static_cast<const Dtype*>(this->blobs_[0]->mutable_gpu_diff()),static_cast<const Dtype*>(this->blobs_[0]->gpu_mask()),static_cast<Dtype*>(this->blobs_[0]->mutable_gpu_diff()));
18     Dtype* weight_diff = this->blobs_[0]->mutable_gpu_diff();
19     //for(int j=0;j<count;j++)
20       //weight_diff[j]*=this->masks_[j];
21     // Gradient with respect to weight
22     if (transpose_) {
23       caffe_gpu_gemm<Dtype>(CblasTrans, CblasNoTrans,
24           K_, N_, M_,
25           (Dtype)1., bottom_data, top_diff,
26           (Dtype)1., weight_diff);
27     } else {
28       caffe_gpu_gemm<Dtype>(CblasTrans, CblasNoTrans,
29           N_, K_, M_,
30           (Dtype)1., top_diff, bottom_data,
31           (Dtype)1., weight_diff);
32     }
33   }
34   if (bias_term_ && this->param_propagate_down_[1]) {
35     const Dtype* top_diff = top[0]->gpu_diff();
36     // Gradient with respect to bias
37     caffe_gpu_gemv<Dtype>(CblasTrans, M_, N_, (Dtype)1., top_diff,
38         bias_multiplier_.gpu_data(), (Dtype)1.,
39         this->blobs_[1]->mutable_gpu_diff());
40   }
41   if (propagate_down[0]) {
42     const Dtype* top_diff = top[0]->gpu_diff();
43     // Gradient with respect to bottom data
44     if (transpose_) {
45       caffe_gpu_gemm<Dtype>(CblasNoTrans, CblasTrans,
46           M_, K_, N_,
47           (Dtype)1., top_diff, this->blobs_[0]->gpu_data(),
48           (Dtype)0., bottom[0]->mutable_gpu_diff());
49     } else {
50       caffe_gpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans,
51           M_, K_, N_,
52          (Dtype)1., top_diff, this->blobs_[0]->gpu_data(),
53          (Dtype)0., bottom[0]->mutable_gpu_diff());
54     }
55   }
56 }

至此修改完畢。

另外，caffe在新的版本中已添加sparse_參數，參考 https://github.com/BVLC/caffe/pulls?utf8=%E2%9C%93&q=sparse

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Mask R-CNN用於目標檢測和分割代碼實現什么是反向傳播前向傳播與反向傳播反向傳播算法為什么要“反向” Mask R-CNN mask r-cnn Mask R-CNN翻譯反向傳播算法 CNN之Caffe配置《神經網絡的梯度推導與代碼驗證》之FNN（DNN）前向和反向傳播過程的代碼驗證