學習Faster R-CNN代碼roi_pooling（三）

本文轉載自查看原文 2019-08-14 22:04 475 Faster-RCNN

這一篇單獨拿出來了解一下roi_pooling/src/roi_pooling.c中C代碼：

說明
我查過一些，但沒有查到太多有用的信息，連百度#include <TH/TH.h>都百度不出太多信息，更不知道THFloatTensor_data，THFloatTensor_size具體怎么用。可能我查到的信息還是太少了吧，下面說一下我自己的理解吧，不能保證正確。

1.關於頭文件TH/TH.h
#include<TH/TH.h>包括了 pytorch C 代碼數據結構和函數的聲明，這是pytorch底層接口。

2.roi_pooling_forward的參數

1 int roi_pooling_forward(int pooled_height, int pooled_width, float spatial_scale,
2                         THFloatTensor * features, THFloatTensor * rois, THFloatTensor * output)

pooled_height pooling后的高；
pooled_width pooling后的寬；
spatial_scale 空間尺度，輸入圖片與feature map之前的比值，這個feature map指roi pooling層的輸入；
features 第一個網絡卷積后的特征圖；
rois 所有感興趣區域；
output 指的是pooling后的結果？

3.函數里面的變量

1 // Grab the input tensor
2     float * data_flat = THFloatTensor_data(features);
3     float * rois_flat = THFloatTensor_data(rois);
4 
5     float * output_flat = THFloatTensor_data(output);

把這幾個參數值提取出來。在C里面就是開辟一塊連續的內存來存儲這些數據。
THFloatTensor_data作用就是提取值吧。

1 // Number of ROIs
2     int num_rois = THFloatTensor_size(rois, 0);
3     int size_rois = THFloatTensor_size(rois, 1);

根據上面代碼rois信息包括num_rois和size_rois，即感興趣區域的數量和大小（這里的大小指的是roi的大小，准確的說是占據的內存區域）。

 1 // batch size
 2     int batch_size = THFloatTensor_size(features, 0);
 3     if(batch_size != 1)
 4     {
 5         return 0;
 6     }
 7     // data height
 8     int data_height = THFloatTensor_size(features, 1);
 9     // data width
10     int data_width = THFloatTensor_size(features, 2);
11     // Number of channels
12     int num_channels = THFloatTensor_size(features, 3);

features信息包括batch_size，data_height，data_width，num_channels即批尺寸，特征數據高度，特征數據寬度，特征的通道數。

1 // Set all element of the output tensor to -inf.
2     THFloatStorage_fill(THFloatTensor_storage(output), -1);

開始是把所有輸出張量的元素設置為負無窮。

接下來就要對每個ROI進行max pool了。

// For each ROI R = [batch_index x1 y1 x2 y2]: max pool over R
    int index_roi = 0;
    int index_output = 0;
    int n;
    for (n = 0; n < num_rois; ++n)

初始化roi索引是0；初始化輸出索引是0。然后開始遍歷所有的感興趣區域。

1         int roi_batch_ind = rois_flat[index_roi + 0];
2         int roi_start_w = round(rois_flat[index_roi + 1] * spatial_scale);
3         int roi_start_h = round(rois_flat[index_roi + 2] * spatial_scale);
4         int roi_end_w = round(rois_flat[index_roi + 3] * spatial_scale);
5         int roi_end_h = round(rois_flat[index_roi + 4] * spatial_scale);

上面代碼是取出roi的信息，roi_batch_ind，roi_start_w，roi_start_h，roi_end_w，roi_end_h，包括批的索引，ROI左上角和右下角的坐標。

對於每個ROI，從rois_flat中取出索引以及坐標信息，坐標信息乘以spatial_scale是因為這個值是輸入圖片與feature map之前的比值所以乘上這個比值就是把坐標映射到了原圖像上，而不是在featuremap上。映射到原圖像時可能不是對齊的，所以這里要四舍五入取整。

1         int roi_height = fmaxf(roi_end_h - roi_start_h + 1, 1);
2         int roi_width = fmaxf(roi_end_w - roi_start_w + 1, 1);
3         float bin_size_h = (float)(roi_height) / (float)(pooled_height);
4         float bin_size_w = (float)(roi_width) / (float)(pooled_width);

得到ROI的高度和寬度，pooling后bin的高和寬，這里bin的長寬是個浮點數，不一定是整數。（這里bin指的是pooling后的一小塊，即后文中的sections，這里引入bin的目的是將不同大小尺度的ROI，resize成相同大小的尺寸的feature map，便於之后的分類工作）

1         int index_data = roi_batch_ind * data_height * data_width * num_channels;
2         const int output_area = pooled_width * pooled_height;

index_data指什么？是批索引乘以特征圖高度乘以特征圖寬度乘以特征圖通道數。
output_area是pooling后輸出的大小，因為pooling大小是固定的，這個值是不變的。

1        int c, ph, pw;
2         for (ph = 0; ph < pooled_height; ++ph)
3         {
4             for (pw = 0; pw < pooled_width; ++pw)
5             {

上面代碼就是進行對每個bin進行pooling了。

1         int hstart = (floor((float)(ph) * bin_size_h));
2         int wstart = (floor((float)(pw) * bin_size_w));
3         int hend = (ceil((float)(ph + 1) * bin_size_h));
4         int wend = (ceil((float)(pw + 1) * bin_size_w));

hstart和wstart是每個bin的在ROI的左上角位置。ceil函數是返回不小於這個數的整數，hend和wend就是bin在ROI的右下角位置。因為是ceil函數，所以左上角的bin不小於右下角的bin。

1         hstart = fminf(fmaxf(hstart + roi_start_h, 0), data_height);
2         hend = fminf(fmaxf(hend + roi_start_h, 0), data_height);
3         wstart = fminf(fmaxf(wstart + roi_start_w, 0), data_width);
4         wend = fminf(fmaxf(wend + roi_start_w, 0), data_width);

hstart、wstart、hend和wend就是返回bin在原圖的位置，原本是在ROI中的位置。

 1            int h, w, c;
 2             for (h = hstart; h < hend; ++h)
 3             {
 4                 for (w = wstart; w < wend; ++w)
 5                 {
 6                     for (c = 0; c < num_channels; ++c)
 7                     {
 8                         const int index = (h * data_width + w) * num_channels + c;
 9                         if (data_flat[index_data + index] > output_flat[pool_index + c * output_area])
10                         {
11                             output_flat[pool_index + c * output_area] = data_flat[index_data + index];
12                         }
13                     }
14                 }
15             }

上面循環就是bin的高度嵌套寬度嵌套通道數，然后就取這個bin中的最大值。

1         // Increment ROI index
2         index_roi += size_rois;
3         index_output += pooled_height * pooled_width * num_channels;

當處理完一個ROI之后，更新index_roi和index_output 信息，因為C語言中是連續內存，ROI索引就是加上size_rois即ROI大小，輸出索引就是加上pooling后占據的內存大小。

簡單總結一下：

ROI pooling具體操作：

（1）根據輸入image，將ROI映射到feature map對應位置；

（2）將映射后的區域划分為相同大小的sections（sections數量與輸出的維度相同）；

（3）對每個sections進行max pooling操作；

這樣我們就可以從不同大小的方框得到固定大小的相應的feature maps。值得一提的是，輸出的feature maps的大小不取決於ROI和卷積feature maps大小。ROI pooling 最大的好處就在於極大地提高了處理速度。

ROI pooling example

考慮一個8*8大小的feature map，一個ROI，以及輸出大小為2*2.

（1）輸入的固定大小的feature map

（2）region proposal 投影之后位置（左上角，右下角坐標）：（0，3），（7，8）。

（3）將其划分為（2*2）個sections（因為輸出大小為2*2），我們可以得到：

（4）對每個section做max pooling，可以得到：

下面上完整代碼
** ## roi_pooling/src/roi_pooling.c ## **

  1 #include <TH/TH.h>
  2 #include <math.h>
  3 
  4 int roi_pooling_forward(int pooled_height, int pooled_width, float spatial_scale,
  5                         THFloatTensor * features, THFloatTensor * rois, THFloatTensor * output)
  6 {
  7     // Grab the input tensor
  8     float * data_flat = THFloatTensor_data(features);
  9     float * rois_flat = THFloatTensor_data(rois);
 10 
 11     float * output_flat = THFloatTensor_data(output);
 12 
 13     // Number of ROIs
 14     int num_rois = THFloatTensor_size(rois, 0);
 15     int size_rois = THFloatTensor_size(rois, 1);
 16     // batch size
 17     int batch_size = THFloatTensor_size(features, 0);
 18     if(batch_size != 1)
 19     {
 20         return 0;
 21     }
 22     // data height
 23     int data_height = THFloatTensor_size(features, 1);
 24     // data width
 25     int data_width = THFloatTensor_size(features, 2);
 26     // Number of channels
 27     int num_channels = THFloatTensor_size(features, 3);
 28 
 29     // Set all element of the output tensor to -inf.
 30     THFloatStorage_fill(THFloatTensor_storage(output), -1);
 31 
 32     // For each ROI R = [batch_index x1 y1 x2 y2]: max pool over R
 33     int index_roi = 0;
 34     int index_output = 0;
 35     int n;
 36     for (n = 0; n < num_rois; ++n)
 37     {
 38         int roi_batch_ind = rois_flat[index_roi + 0];
 39         int roi_start_w = round(rois_flat[index_roi + 1] * spatial_scale);
 40         int roi_start_h = round(rois_flat[index_roi + 2] * spatial_scale);
 41         int roi_end_w = round(rois_flat[index_roi + 3] * spatial_scale);
 42         int roi_end_h = round(rois_flat[index_roi + 4] * spatial_scale);
 43         //      CHECK_GE(roi_batch_ind, 0);
 44         //      CHECK_LT(roi_batch_ind, batch_size);
 45 
 46         int roi_height = fmaxf(roi_end_h - roi_start_h + 1, 1);
 47         int roi_width = fmaxf(roi_end_w - roi_start_w + 1, 1);
 48         float bin_size_h = (float)(roi_height) / (float)(pooled_height);
 49         float bin_size_w = (float)(roi_width) / (float)(pooled_width);
 50 
 51         int index_data = roi_batch_ind * data_height * data_width * num_channels;
 52         const int output_area = pooled_width * pooled_height;
 53 
 54         int c, ph, pw;
 55         for (ph = 0; ph < pooled_height; ++ph)
 56         {
 57             for (pw = 0; pw < pooled_width; ++pw)
 58             {
 59                 int hstart = (floor((float)(ph) * bin_size_h));
 60                 int wstart = (floor((float)(pw) * bin_size_w));
 61                 int hend = (ceil((float)(ph + 1) * bin_size_h));
 62                 int wend = (ceil((float)(pw + 1) * bin_size_w));
 63 
 64                 hstart = fminf(fmaxf(hstart + roi_start_h, 0), data_height);
 65                 hend = fminf(fmaxf(hend + roi_start_h, 0), data_height);
 66                 wstart = fminf(fmaxf(wstart + roi_start_w, 0), data_width);
 67                 wend = fminf(fmaxf(wend + roi_start_w, 0), data_width);
 68 
 69                 const int pool_index = index_output + (ph * pooled_width + pw);
 70                 int is_empty = (hend <= hstart) || (wend <= wstart);
 71                 if (is_empty)
 72                 {
 73                     for (c = 0; c < num_channels * output_area; c += output_area)
 74                     {
 75                         output_flat[pool_index + c] = 0;
 76                     }
 77                 }
 78                 else
 79                 {
 80                     int h, w, c;
 81                     for (h = hstart; h < hend; ++h)
 82                     {
 83                         for (w = wstart; w < wend; ++w)
 84                         {
 85                             for (c = 0; c < num_channels; ++c)
 86                             {
 87                                 const int index = (h * data_width + w) * num_channels + c;
 88                                 if (data_flat[index_data + index] > output_flat[pool_index + c * output_area])
 89                                 {
 90                                     output_flat[pool_index + c * output_area] = data_flat[index_data + index];
 91                                 }
 92                             }
 93                         }
 94                     }
 95                 }
 96             }
 97         }
 98 
 99         // Increment ROI index
100         index_roi += size_rois;
101         index_output += pooled_height * pooled_width * num_channels;
102     }
103     return 1;
104 }

ref：https://blog.csdn.net/auto1993/article/details/78514071

https://blog.csdn.net/weixin_43872578/article/details/86628515

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 學習Faster R-CNN代碼roi_pooling（二）學習Faster R-CNN代碼roi_align（五）學習Faster R-CNN代碼rpn（六）學習Faster R-CNN代碼nms（七）學習Faster R-CNN代碼demo（一）學習Faster R-CNN代碼faster_rcnn（八） Faster R-CNN代碼例子 Faster R-CNN Fast R-CNN(RoI) Faster R-CNN教程