HOG特征提取
1 背景
HOG是Histogram of Oriented Gradient的縮寫,是一種在計算機視覺和圖像處理中用來進行目標檢測的特征描述子。可結合OPENCV的SVM分類器等用於圖像的識別。
2 HOG特征原理
2.1 概述
HOG特征通過提取圖像直方圖方法,計算圖像特征。HOG特征將圖像分為三個部分,分別為窗口、圖像塊和細胞單元。之間的關系:圖像(image)->檢測窗口(win)->圖像塊(block)->細胞單元(cell)。圖像展示如下:
黑色為窗口的划分,藍色為塊的划分,黃色為細胞的划分。在檢測窗口中,將圖像根據窗口大小划分為多個窗口,在每個窗口內根據塊的大小划分為多個塊,在每個窗口內根據細胞單元的大小划分為小包單元。
HOG整體流程可分為六步:檢測窗口、歸一化圖像、計算梯度、梯度直方圖歸一化和獲取HOG特征向量。以下分步驟介紹。
2.2 檢測窗口:
HOG特征首先將圖像根據設定的窗口大小分割為多個窗口,再將窗口分割為塊,將塊分割為細胞。關系如下:
窗口(window):將圖像按一定大小分割成多個相同的窗口,滑動。
塊(block):將每個窗口按一定大小分割成多個相同的塊,滑動。
細胞(cell):將每個窗口按一定大小分割成多個相同的細胞,屬於特征提取的單元,靜止不動。
滑動表示窗口可左右滑動一定單位進行計算直方圖。如下圖表示。
在計算1窗口后,根據滑動大小,將1窗口滑動至2窗口,進行2窗口的計算。
2.3歸一化圖像:
歸一化圖像可分為gamma空間和顏色空間歸一化。顏色歸一化即為像素值歸一化,可用於減少光照因素的影響。歸一化公式:
y=(x-MinValue)/(MaxValue-MinValue))
gamma空間歸一化可避免在圖像的紋理強度中,局部的表層曝光貢獻度的比重較大的情況。Gamma壓縮公式:
I(x,y)=I(x,y)^gamma.
Gamma可根據情況進行取值,如1/2.
2.4計算梯度:
在進行歸一化后,分別計算圖像像素點橫坐標和縱坐標方向上的梯度,根據橫坐標和縱坐標方向上的梯度大小,計算像素點的梯度方向。公式如下:
Gx和Gy分表表示水平和豎直方向的梯度大小,H表示歸一化后的像素點的大小。α表示該像素點的梯度方向。在程序編寫中,常用[-1,0,1]對x方向卷積和[-1,0,1]T對y方向卷積實現。
2.5構建梯度直方圖
計算圖像中每個像素點的梯度方向后,可進行構建梯度直方圖。梯度直方圖以細胞為單位,統計細胞內一定方向范圍內梯度方向的數目。具體如下:
將180度分為多個bins(表示划分的區間數目),統計每個bins范圍內像素點梯度方向的數目。圖像表示如下:
一般計算中,dims選擇為9,將180°分為0~20、20~40…、160~180。上圖中的180~360°划分與0~180°划分相對應。同時在進行數量計算時,並非簡單的只統計每個區間內的數量,而是通過一定的加權函數,同時對相鄰dims進行數量上的增加。具體如下:
假若一個像素點的梯度方向為25°,距離0~20°和20~40°最近,采用加權方法,對相鄰兩個區間進行幅度值的增加,增大大小分別為: (25-10)/20=0.75和(25-20)/20=0.25。
2.6 塊內對細胞直方圖歸一化
在對像素點進行歸一化后,在一定程度上削弱了光照的影響。為進一步削弱局部光照的變化和對比度的變化,再次使用歸一化函數,對每個塊內的細胞的直方圖進行歸一化。
2.7 生成HOG特征向量
通過上面塊內細胞直方圖的歸一化,並得到每個塊內細胞直方圖的數據。組成窗口的所用塊,構成HOG特征向量。方法如:
對一個64*128的窗口,8*8像素為一個細胞,2*2個細胞為一個塊。則每個塊有9*4個特征,以8個像素為窗口滑動步長,水平方向有7個掃描塊,豎直方向有15個掃描塊。一個64*128的窗口共9*4*7*15=3780個特征。
2.8 HOG-PLUS
HOG特征提取算法中,共兩個部分需用到加權函數,上面在構建細胞直方圖部分已經提到一處。同時在進行細胞歸一化時,仍需要進行梯度直方圖的構建。下面作為解釋。
在構建梯度直方圖時,存在一個既定假設,即不同細胞單元的像素點只對其所屬細胞單元的直方圖構成影響,並不會對其周圍的細胞單元的直方圖產生影響,但在細胞交接處的像素點和在塊進行滑動時,與上面假設相互矛盾。
如下圖,左圖中的方框處為待處理像素點,它位於block中的C0單元中,根據位於不同細胞內的像素點只會對其從屬的細胞進行投影,那像素點僅僅會對C0細胞產生影響,而忽略了對C1,C2,C3細胞的貢獻,為了彌補,借鑒線性插值方法在各個像素的位置上進行加權運算,利用該點與四個cell中的中心像素點(圖中4個圓點)的距離計算權值,將待處理像素點的梯度幅值分別加權累加到C0、C1、C2、C3中相應的直方圖上。
綜合考慮,在兩個位置坐標(x,y)和一個方向坐標( θ )上進行三線性插值。HOG特征提取原理中將一個像素點處的梯度幅值加權分配到4個cell中與該點梯度方向最近的的2個bin上。公式如下,其中x、y軸表征像素點的空間位置,z軸表征該點的梯度方向(即θ)。對於待處理像素點(x,y),設其梯度幅值為ω ,梯度方向為z,z1和z2分別是與之最近的兩個bin的中點坐標(這個坐標可理解為角度坐標)。梯度直方圖h沿x、y、z三個維度的直方圖帶寬分別為b=[bx, by, bz],bx=by=8,bz=180°/9。
h(x1,y1,z1)←h(x1,y1,z1)+ω(1- x -x1bx )(1- y -y1by )(1- z -z1bz )
h(x1,y1,z2)←h(x1,y1,z2)+ω(1- x -x1bx )(1- y -y1by )(1- z -z2bz )
h(x1,y2,z1)←h(x1,y2,z1)+ω(1- x -x1bx )(y -y2by )(1- z -z1bz )
h(x2,y1,z1)←h(x2,y1,z1)+ω(x -x1bx )(1- y -y1by )(1- z -z1bz )
h(x1,y2,z2)←h(x1,y2,z2)+ω(1- x -x1bx )(y -y2by )(z -z2bz )
h(x2,y1,z2)←h(x2,y1,z2)+ω(x -x 2bx )(1- y -y1by )(1- z -z2bz )
h(x2,y2,z1)←h(x2,y2,z1)+ω(x -x 2bx )(1- y -y2by )(1- z -z1bz )
h(x2,y2,z2)←h(x2,y2,z2)+ω(x -x 2bx )(y -y2by )(z -z2bz )
如圖所示為三線性插值計算梯度方向直方圖向量的示意圖,左圖中的方框處為待處理像素點,計算block的每個cell中與該點梯度方向相鄰的2個bin,共計8個直方圖柱上的權值,將該點的梯度幅值進行加權累加,即形成block中的梯度方向直方圖。
3 代碼
3.1 API和Demo:
HOGDescriptor hog(Size(64,128),Size(16,16),Size(8,8),Size(8,8),9);//創建HOG,參數分別為窗口大小(64,128),塊尺寸(16,16),塊步長(8,8),cell尺寸(8,8),直方圖bin個數9
std::vector<float> descriptors;
hog->compute(trainImg,descriptors, Size(64, 48), Size(0, 0)); //參數分別為圖像,HOG特征描述子,window步長,圖像填充大小padding,window步長和padding可忽略。
Demo部分占個坑吧,后續使用HOG時,再回來補坑。
HOG函數的實現:
HOGDescriptor hog(Size(64,128),Size(16,16),Size(8,8),Size(8,8),9);//創建HOG,參數分別為窗口大小(64,128),塊尺寸(16,16),塊步長(8,8),cell尺寸(8,8),直方圖bin個數9 std::vector<float> descriptors;
hog->compute(trainImg,descriptors, Size(64, 48), Size(0, 0)); //參數分別為圖像,HOG特征描述子,window步長,圖像填充大小padding,window步長和padding可忽略。
HOG+SVM行人識別demo:
#include<iostream> #include <fstream> #include <opencv2/core/core.hpp> #include <opencv2/highgui/highgui.hpp> #include <opencv2/imgproc/imgproc.hpp> #include <opencv2/objdetect/objdetect.hpp> #include <opencv2/ml/ml.hpp> using namespace cv; using namespace std; #define PosSamNO 1126 //正樣本個數 #define NegSamNO 1210 //負樣本個數 //生成setSVMDetector()中用到的檢測子參數時要用到的SVM的decision_func參數時protected類型,只能繼承之后通過函數訪問 class MySVM : public CvSVM { public: //獲得SVM的決策函數中的alpha數組 double * get_alpha_vector() { return this->decision_func->alpha; } //獲得SVM的決策函數中的rho參數,即偏移量 float get_rho() { return this->decision_func->rho; } }; int main() { HOGDescriptor hog(Size(64,128),Size(16,16),Size(8,8),Size(8,8),9);//窗口大小(64,128),塊尺寸(16,16),塊步長(8,8),cell尺寸(8,8),直方圖bin個數9 int DescriptorDim;//HOG描述子的維數,由圖片大小、檢測窗口大小、塊大小、細胞單元中直方圖bin個數決定 MySVM svm; string ImgName;//圖片名 ifstream finPos("pos.txt");//正樣本圖片的文件名列表 ifstream finNeg("neg.txt");//負樣本圖片的文件名列表 Mat sampleFeatureMat;//所有訓練樣本的特征向量組成的矩陣,行數等於所有樣本的個數,列數等於HOG描述子維數 Mat sampleLabelMat;//訓練樣本的類別向量,行數等於所有樣本的個數,列數等於1;1表示有人,-1表示無人 //依次讀取正樣本圖片,生成HOG描述子 for(int num=0; num<PosSamNO && getline(finPos,ImgName); num++) { ImgName = "E:\\INRIAPerson\\Posjpg64_128\\" + ImgName;//加上正樣本的路徑名 Mat src = imread(ImgName);//讀取圖片 vector<float> descriptors;//HOG描述子向量 hog.compute(src,descriptors,Size(8,8));//計算HOG描述子,檢測窗口移動步長(8,8) //處理第一個樣本時初始化特征向量矩陣和類別矩陣,因為只有知道了特征向量的維數才能初始化特征向量矩陣 if( 0 == num ) { DescriptorDim = descriptors.size();//HOG描述子的維數 //初始化所有訓練樣本的特征向量組成的矩陣sampleFeatureMat,行數等於所有樣本的個數,列數等於HOG描述子維數 sampleFeatureMat = Mat::zeros(PosSamNO+NegSamNO, DescriptorDim, CV_32FC1); //初始化訓練樣本的類別向量,行數等於所有樣本的個數,列數等於1;1表示有人,-1表示無人 sampleLabelMat = Mat::zeros(PosSamNO+NegSamNO+HardExampleNO, 1, CV_32FC1); } //將計算好的HOG描述子復制到樣本特征矩陣sampleFeatureMat for(int i=0; i<DescriptorDim; i++) sampleFeatureMat.at<float>(num,i) = descriptors[i];//第num個樣本的特征向量中的第i個元素 sampleLabelMat.at<float>(num,0) = 1;//正樣本類別為1,有人 } //處理負樣本的流程和正樣本大同小異 for(int num=0; num<NegSamNO && getline(finNeg,ImgName); num++) { ImgName = "E:\\INRIAPerson\\Negjpg_undesign\\" + ImgName;//加上負樣本的路徑名 Mat src = imread(ImgName);//讀取圖片 vector<float> descriptors;//HOG描述子向量 hog.compute(src,descriptors,Size(8,8));//計算HOG描述子,檢測窗口移動步長(8,8) //將計算好的HOG描述子復制到樣本特征矩陣sampleFeatureMat for(int i=0; i<DescriptorDim; i++) sampleFeatureMat.at<float>(num+PosSamNO,i) = descriptors[i];//第PosSamNO+num個樣本的特征向量中的第i個元素 sampleLabelMat.at<float>(num+PosSamNO,0) = -1;//負樣本類別為-1,無人 } //輸出樣本的HOG特征向量矩陣到文件 ofstream fout("SampleFeatureMat.txt"); for(int i=0; i<PosSamNO+NegSamNO; i++) { fout<<i<<endl; for(int j=0; j<DescriptorDim; j++) fout<<sampleFeatureMat.at<float>(i,j)<<" "; fout<<endl; } //訓練SVM分類器,迭代終止條件,當迭代滿1000次或誤差小於FLT_EPSILON時停止迭代 CvTermCriteria criteria = cvTermCriteria(CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 1000, FLT_EPSILON); //SVM參數:SVM類型為C_SVC;線性核函數;松弛因子C=0.01 CvSVMParams param(CvSVM::C_SVC, CvSVM::LINEAR, 0, 1, 0, 0.01, 0, 0, 0, criteria); cout<<"開始訓練SVM分類器"<<endl; svm.train(sampleFeatureMat, sampleLabelMat, Mat(), Mat(), param); cout<<"訓練完成"<<endl; svm.save("SVM_HOG.xml");//將訓練好的SVM模型保存為xml文件 DescriptorDim = svm.get_var_count();//特征向量的維數,即HOG描述子的維數 cout<<"描述子維數:"<<DescriptorDim<<endl; int supportVectorNum = svm.get_support_vector_count();//支持向量的個數 cout<<"支持向量個數:"<<supportVectorNum<<endl; Mat alphaMat = Mat::zeros(1, supportVectorNum, CV_32FC1);//alpha向量,長度等於支持向量個數 Mat supportVectorMat = Mat::zeros(supportVectorNum, DescriptorDim, CV_32FC1);//支持向量矩陣 Mat resultMat = Mat::zeros(1, DescriptorDim, CV_32FC1);//alpha向量乘以支持向量矩陣的結果 //將支持向量的數據復制到supportVectorMat矩陣中,共有supportVectorNum個支持向量,每個支持向量的數據有DescriptorDim維(種) for(int i=0; i<supportVectorNum; i++) { const float * pSVData = svm.get_support_vector(i);//返回第i個支持向量的數據指針 for(int j=0; j<DescriptorDim; j++) supportVectorMat.at<float>(i,j) = pSVData[j];//第i個向量的第j維數據 } //將alpha向量的數據復制到alphaMat中 //double * pAlphaData = svm.get_alpha_vector();//返回SVM的決策函數中的alpha向量 double * pAlphaData = svm.get_alpha_vector(); for(int i=0; i<supportVectorNum; i++) { alphaMat.at<float>(0,i) = pAlphaData[i];//alpha向量,長度等於支持向量個數 } resultMat = -1 * alphaMat * supportVectorMat;//計算-(alphaMat * supportVectorMat),結果放到resultMat中, //注意因為svm.predict使用的是alpha*sv*another-rho,如果為負的話則認為是正樣本,在HOG的檢測函數中, //使用rho-alpha*sv*another如果為正的話是正樣本,所以需要將后者變為負數之后保存起來 //得到最終的setSVMDetector(const vector<float>& detector)參數中可用的檢測子 vector<float> myDetector; //將resultMat中的數據復制到數組myDetector中 for(int i=0; i<DescriptorDim; i++) { myDetector.push_back(resultMat.at<float>(0,i)); } myDetector.push_back(svm.get_rho());//最后添加偏移量rho,得到檢測子 cout<<"檢測子維數:"<<myDetector.size()<<endl; //設置HOGDescriptor的檢測子,用我們訓練的檢測器代替默認的檢測器 HOGDescriptor myHOG; myHOG.setSVMDetector(myDetector); //保存檢測子參數到文件 ofstream fout("HOGDetectorParagram.txt"); for(int i=0; i<myDetector.size(); i++) fout<<myDetector[i]<<endl; //讀入圖片進行人體檢測 Mat src = imread("test1.png"); vector<Rect> found, found_filtered;//矩形框數組 cout<<"進行多尺度HOG人體檢測"<<endl; myHOG.detectMultiScale(src, found, 0, Size(8,8), Size(32,32), 1.05, 2);//對圖片進行多尺度行人檢測 cout<<"找到的矩形框個數:"<<found.size()<<endl; //找出所有沒有嵌套的矩形框r,並放入found_filtered中,如果有嵌套的話,則取外面最大的那個矩形框放入found_filtered中 for(int i=0; i < found.size(); i++) { Rect r = found[i]; int j=0; for(; j < found.size(); j++) { if(j != i && (r & found[j]) == r)//說明r是被嵌套在found[j]里面的,舍棄當前的r break; } if( j == found.size())//r沒有被嵌套在第0,1,2...found.size()-1號的矩形框內,則r是符合條件的 found_filtered.push_back(r); } //對畫出來的矩形框做一些大小調整 for(int i=0; i<found_filtered.size(); i++) { Rect r = found_filtered[i]; r.x += cvRound(r.width*0.1); r.width = cvRound(r.width*0.8); r.y += cvRound(r.height*0.07); r.height = cvRound(r.height*0.8); rectangle(src, r.tl(), r.br(), Scalar(255,0,0), 2); } imwrite("ImgProcessed.jpg",src); namedWindow("src",0); imshow("src",src); waitKey(); }
HOG源代碼:
1 /*M/////////////////////////////////////////////////////////////////////////////////////// 2 // 3 // IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING. 4 // 5 // By downloading, copying, installing or using the software you agree to this license. 6 // If you do not agree to this license, do not download, install, 7 // copy or use the software. 8 // 9 // 10 // License Agreement 11 // For Open Source Computer Vision Library 12 // 13 // Copyright (C) 2000-2008, Intel Corporation, all rights reserved. 14 // Copyright (C) 2009, Willow Garage Inc., all rights reserved. 15 // Third party copyrights are property of their respective owners. 16 // 17 // Redistribution and use in source and binary forms, with or without modification, 18 // are permitted provided that the following conditions are met: 19 // 20 // * Redistribution's of source code must retain the above copyright notice, 21 // this list of conditions and the following disclaimer. 22 // 23 // * Redistribution's in binary form must reproduce the above copyright notice, 24 // this list of conditions and the following disclaimer in the documentation 25 // and/or other materials provided with the distribution. 26 // 27 // * The name of the copyright holders may not be used to endorse or promote products 28 // derived from this software without specific prior written permission. 29 // 30 // This software is provided by the copyright holders and contributors "as is" and 31 // any express or implied warranties, including, but not limited to, the implied 32 // warranties of merchantability and fitness for a particular purpose are disclaimed. 33 // In no event shall the Intel Corporation or contributors be liable for any direct, 34 // indirect, incidental, special, exemplary, or consequential damages 35 // (including, but not limited to, procurement of substitute goods or services; 36 // loss of use, data, or profits; or business interruption) however caused 37 // and on any theory of liability, whether in contract, strict liability, 38 // or tort (including negligence or otherwise) arising in any way out of 39 // the use of this software, even if advised of the possibility of such damage. 40 // 41 //M*/ 42 43 #include "precomp.hpp" 44 #include <iterator> 45 #ifdef HAVE_IPP 46 #include "ipp.h" 47 #endif 48 /****************************************************************************************\ 49 The code below is implementation of HOG (Histogram-of-Oriented Gradients) 50 descriptor and object detection, introduced by Navneet Dalal and Bill Triggs. 51 52 The computed feature vectors are compatible with the 53 INRIA Object Detection and Localization Toolkit 54 (http://pascal.inrialpes.fr/soft/olt/) 55 \****************************************************************************************/ 56 57 namespace cv 58 { 59 60 size_t HOGDescriptor::getDescriptorSize() const 61 { 62 //下面2個語句是保證block中有整數個cell;保證block在窗口中能移動整數次 63 CV_Assert(blockSize.width % cellSize.width == 0 && 64 blockSize.height % cellSize.height == 0); 65 CV_Assert((winSize.width - blockSize.width) % blockStride.width == 0 && 66 (winSize.height - blockSize.height) % blockStride.height == 0 ); 67 //返回的nbins是每個窗口中檢測到的hog向量的維數 68 return (size_t)nbins* 69 (blockSize.width/cellSize.width)* 70 (blockSize.height/cellSize.height)* 71 ((winSize.width - blockSize.width)/blockStride.width + 1)* 72 ((winSize.height - blockSize.height)/blockStride.height + 1); 73 } 74 75 //winSigma到底是什么作用呢? 76 double HOGDescriptor::getWinSigma() const 77 { 78 return winSigma >= 0 ? winSigma : (blockSize.width + blockSize.height)/8.; 79 } 80 81 //svmDetector是HOGDescriptor內的一個成員變量,數據類型為向量vector。 82 //用來保存hog特征用於svm分類時的系數的. 83 //該函數返回為真的實際含義是什么呢?保證與hog特征長度相同,或者相差1,但為什么 84 //相差1也可以呢? 85 bool HOGDescriptor::checkDetectorSize() const 86 { 87 size_t detectorSize = svmDetector.size(), descriptorSize = getDescriptorSize(); 88 return detectorSize == 0 || 89 detectorSize == descriptorSize || 90 detectorSize == descriptorSize + 1; 91 } 92 93 void HOGDescriptor::setSVMDetector(InputArray _svmDetector) 94 { 95 //這里的convertTo函數只是將圖像Mat屬性更改,比如說通道數,矩陣深度等。 96 //這里是將輸入的svm系數矩陣全部轉換成浮點型。 97 _svmDetector.getMat().convertTo(svmDetector, CV_32F); 98 CV_Assert( checkDetectorSize() ); 99 } 100 101 #define CV_TYPE_NAME_HOG_DESCRIPTOR "opencv-object-detector-hog" 102 103 //FileNode是opencv的core中的一個文件存儲節點類,這個節點用來存儲讀取到的每一個文件元素。 104 //一般是讀取XML和YAML格式的文件 105 //又因為該函數是把文件節點中的內容讀取到其類的成員變量中,所以函數后面不能有關鍵字const 106 bool HOGDescriptor::read(FileNode& obj) 107 { 108 //isMap()是用來判斷這個節點是不是一個映射類型,如果是映射類型,則每個節點都與 109 //一個名字對應起來。因此這里的if語句的作用就是需讀取的文件node是一個映射類型 110 if( !obj.isMap() ) 111 return false; 112 //中括號中的"winSize"是指返回名為winSize的一個節點,因為已經知道這些節點是mapping類型 113 //也就是說都有一個對應的名字。 114 FileNodeIterator it = obj["winSize"].begin(); 115 //操作符>>為從節點中讀入數據,這里是將it指向的節點數據依次讀入winSize.width,winSize.height 116 //下面的幾條語句功能類似 117 it >> winSize.width >> winSize.height; 118 it = obj["blockSize"].begin(); 119 it >> blockSize.width >> blockSize.height; 120 it = obj["blockStride"].begin(); 121 it >> blockStride.width >> blockStride.height; 122 it = obj["cellSize"].begin(); 123 it >> cellSize.width >> cellSize.height; 124 obj["nbins"] >> nbins; 125 obj["derivAperture"] >> derivAperture; 126 obj["winSigma"] >> winSigma; 127 obj["histogramNormType"] >> histogramNormType; 128 obj["L2HysThreshold"] >> L2HysThreshold; 129 obj["gammaCorrection"] >> gammaCorrection; 130 obj["nlevels"] >> nlevels; 131 132 //isSeq()是判斷該節點內容是不是一個序列 133 FileNode vecNode = obj["SVMDetector"]; 134 if( vecNode.isSeq() ) 135 { 136 vecNode >> svmDetector; 137 CV_Assert(checkDetectorSize()); 138 } 139 //上面的都讀取完了后就返回讀取成功標志 140 return true; 141 } 142 143 void HOGDescriptor::write(FileStorage& fs, const String& objName) const 144 { 145 //將objName名字輸入到文件fs中 146 if( !objName.empty() ) 147 fs << objName; 148 149 fs << "{" CV_TYPE_NAME_HOG_DESCRIPTOR 150 //下面幾句依次將hog描述子內的變量輸入到文件fs中,且每次輸入前都輸入 151 //一個名字與其對應,因此這些節點是mapping類型。 152 << "winSize" << winSize 153 << "blockSize" << blockSize 154 << "blockStride" << blockStride 155 << "cellSize" << cellSize 156 << "nbins" << nbins 157 << "derivAperture" << derivAperture 158 << "winSigma" << getWinSigma() 159 << "histogramNormType" << histogramNormType 160 << "L2HysThreshold" << L2HysThreshold 161 << "gammaCorrection" << gammaCorrection 162 << "nlevels" << nlevels; 163 if( !svmDetector.empty() ) 164 //svmDetector則是直接輸入序列,也有對應的名字。 165 fs << "SVMDetector" << "[:" << svmDetector << "]"; 166 fs << "}"; 167 } 168 169 //從給定的文件中讀取參數 170 bool HOGDescriptor::load(const String& filename, const String& objname) 171 { 172 FileStorage fs(filename, FileStorage::READ); 173 //一個文件節點有很多葉子,所以一個文件節點包含了很多內容,這里當然是包含的 174 //HOGDescriptor需要的各種參數了。 175 FileNode obj = !objname.empty() ? fs[objname] : fs.getFirstTopLevelNode(); 176 return read(obj); 177 } 178 179 //將類中的參數以文件節點的形式寫入文件中。 180 void HOGDescriptor::save(const String& filename, const String& objName) const 181 { 182 FileStorage fs(filename, FileStorage::WRITE); 183 write(fs, !objName.empty() ? objName : FileStorage::getDefaultObjectName(filename)); 184 } 185 186 //復制HOG描述子到c中 187 void HOGDescriptor::copyTo(HOGDescriptor& c) const 188 { 189 c.winSize = winSize; 190 c.blockSize = blockSize; 191 c.blockStride = blockStride; 192 c.cellSize = cellSize; 193 c.nbins = nbins; 194 c.derivAperture = derivAperture; 195 c.winSigma = winSigma; 196 c.histogramNormType = histogramNormType; 197 c.L2HysThreshold = L2HysThreshold; 198 c.gammaCorrection = gammaCorrection; 199 //vector類型也可以用等號賦值 200 c.svmDetector = svmDetector; c.nlevels = nlevels; } 201 202 //計算圖像img的梯度幅度圖像grad和梯度方向圖像qangle. 203 //paddingTL為需要在原圖像img左上角擴增的尺寸,同理paddingBR 204 //為需要在img圖像右下角擴增的尺寸。 205 void HOGDescriptor::computeGradient(const Mat& img, Mat& grad, Mat& qangle, 206 Size paddingTL, Size paddingBR) const 207 { 208 //該函數只能計算8位整型深度的單通道或者3通道圖像. 209 CV_Assert( img.type() == CV_8U || img.type() == CV_8UC3 ); 210 211 //將圖像按照輸入參數進行擴充,這里不是為了計算邊緣梯度而做的擴充,因為 212 //為了邊緣梯度而擴充是在后面的代碼完成的,所以這里為什么擴充暫時還不明白。 213 Size gradsize(img.cols + paddingTL.width + paddingBR.width, 214 img.rows + paddingTL.height + paddingBR.height); 215 grad.create(gradsize, CV_32FC2); // <magnitude*(1-alpha), magnitude*alpha> 216 qangle.create(gradsize, CV_8UC2); // [0..nbins-1] - quantized gradient orientation 217 Size wholeSize; 218 Point roiofs; 219 //locateROI在此處是如果img圖像是從其它父圖像中某一部分得來的,那么其父圖像 220 //的大小尺寸就為wholeSize了,img圖像左上角相對於父圖像的位置點就為roiofs了。 221 //對於正樣本,其父圖像就是img了,所以這里的wholeSize就和img.size()是一樣的, 222 //對應負樣本,這2者不同;因為里面的關系比較不好懂,這里權且將wholesSize理解為 223 //img的size,所以roiofs就應當理解為Point(0, 0)了。 224 img.locateROI(wholeSize, roiofs); 225 226 int i, x, y; 227 int cn = img.channels(); 228 229 //_lut為行向量,用來作為浮點像素值的存儲查找表 230 Mat_<float> _lut(1, 256); 231 const float* lut = &_lut(0,0); 232 233 //gamma校正指的是將0~256的像素值全部開根號,即范圍縮小了,且變換范圍都不成線性了, 234 if( gammaCorrection ) 235 for( i = 0; i < 256; i++ ) 236 _lut(0,i) = std::sqrt((float)i); 237 else 238 for( i = 0; i < 256; i++ ) 239 _lut(0,i) = (float)i; 240 241 //創建長度為gradsize.width+gradsize.height+4的整型buffer 242 AutoBuffer<int> mapbuf(gradsize.width + gradsize.height + 4); 243 int* xmap = (int*)mapbuf + 1; 244 int* ymap = xmap + gradsize.width + 2; 245 246 //言外之意思borderType就等於4了,因為opencv的源碼中是如下定義的。 247 //#define IPL_BORDER_REFLECT_101 4 248 //enum{...,BORDER_REFLECT_101=IPL_BORDER_REFLECT_101,...} 249 //borderType為邊界擴充后所填充像素點的方式。 250 /* 251 Various border types, image boundaries are denoted with '|' 252 253 * BORDER_REPLICATE: aaaaaa|abcdefgh|hhhhhhh 254 * BORDER_REFLECT: fedcba|abcdefgh|hgfedcb 255 * BORDER_REFLECT_101: gfedcb|abcdefgh|gfedcba 256 * BORDER_WRAP: cdefgh|abcdefgh|abcdefg 257 * BORDER_CONSTANT: iiiiii|abcdefgh|iiiiiii with some specified 'i' 258 */ 259 const int borderType = (int)BORDER_REFLECT_101; 260 261 for( x = -1; x < gradsize.width + 1; x++ ) 262 /*int borderInterpolate(int p, int len, int borderType) 263 其中參數p表示的是擴充后圖像的一個坐標,相對於對應的坐標軸而言; 264 len參數表示對應源圖像的一個坐標軸的長度;borderType為擴充類型, 265 在上面已經有過介紹. 266 所以這個函數的作用是從擴充后的像素點坐標推斷出源圖像中對應該點 267 的坐標值。 268 */ 269 //這里的xmap和ymap實際含義是什么呢?其實xmap向量里面存的就是 270 //擴充后圖像第一行像素點對應與原圖像img中的像素橫坐標,可以看 271 //出,xmap向量中有些元素的值是相同的,因為擴充圖像肯定會對應 272 //到原圖像img中的某一位置,而img本身尺寸內的像素也會對應該位置。 273 //同理,ymap向量里面存的是擴充后圖像第一列像素點對應於原圖想img 274 //中的像素縱坐標。 275 xmap[x] = borderInterpolate(x - paddingTL.width + roiofs.x, 276 wholeSize.width, borderType) - roiofs.x; 277 for( y = -1; y < gradsize.height + 1; y++ ) 278 ymap[y] = borderInterpolate(y - paddingTL.height + roiofs.y, 279 wholeSize.height, borderType) - roiofs.y; 280 281 // x- & y- derivatives for the whole row 282 int width = gradsize.width; 283 AutoBuffer<float> _dbuf(width*4); 284 float* dbuf = _dbuf; 285 //DX為水平梯度圖,DY為垂直梯度圖,Mag為梯度幅度圖,Angle為梯度角度圖 286 //該構造方法的第4個參數表示矩陣Mat的數據在內存中存放的位置。由此可以 287 //看出,這4幅圖像在內存中是連續存儲的。 288 Mat Dx(1, width, CV_32F, dbuf); 289 Mat Dy(1, width, CV_32F, dbuf + width); 290 Mat Mag(1, width, CV_32F, dbuf + width*2); 291 Mat Angle(1, width, CV_32F, dbuf + width*3); 292 293 int _nbins = nbins; 294 //angleScale==9/pi; 295 float angleScale = (float)(_nbins/CV_PI); 296 #ifdef HAVE_IPP 297 Mat lutimg(img.rows,img.cols,CV_MAKETYPE(CV_32F,cn)); 298 Mat hidxs(1, width, CV_32F); 299 Ipp32f* pHidxs = (Ipp32f*)hidxs.data; 300 Ipp32f* pAngles = (Ipp32f*)Angle.data; 301 302 IppiSize roiSize; 303 roiSize.width = img.cols; 304 roiSize.height = img.rows; 305 306 for( y = 0; y < roiSize.height; y++ ) 307 { 308 const uchar* imgPtr = img.data + y*img.step; 309 float* imglutPtr = (float*)(lutimg.data + y*lutimg.step); 310 311 for( x = 0; x < roiSize.width*cn; x++ ) 312 { 313 imglutPtr[x] = lut[imgPtr[x]]; 314 } 315 } 316 317 #endif 318 for( y = 0; y < gradsize.height; y++ ) 319 { 320 #ifdef HAVE_IPP 321 const float* imgPtr = (float*)(lutimg.data + lutimg.step*ymap[y]); 322 const float* prevPtr = (float*)(lutimg.data + lutimg.step*ymap[y-1]); 323 const float* nextPtr = (float*)(lutimg.data + lutimg.step*ymap[y+1]); 324 #else 325 //imgPtr在這里指的是img圖像的第y行首地址;prePtr指的是img第y-1行首地址; 326 //nextPtr指的是img第y+1行首地址; 327 const uchar* imgPtr = img.data + img.step*ymap[y]; 328 const uchar* prevPtr = img.data + img.step*ymap[y-1]; 329 const uchar* nextPtr = img.data + img.step*ymap[y+1]; 330 #endif 331 float* gradPtr = (float*)grad.ptr(y); 332 uchar* qanglePtr = (uchar*)qangle.ptr(y); 333 334 //輸入圖像img為單通道圖像時的計算 335 if( cn == 1 ) 336 { 337 for( x = 0; x < width; x++ ) 338 { 339 int x1 = xmap[x]; 340 #ifdef HAVE_IPP 341 dbuf[x] = (float)(imgPtr[xmap[x+1]] - imgPtr[xmap[x-1]]); 342 dbuf[width + x] = (float)(nextPtr[x1] - prevPtr[x1]); 343 #else 344 //下面2句把Dx,Dy就計算出來了,因為其對應的內存都在dbuf中 345 dbuf[x] = (float)(lut[imgPtr[xmap[x+1]]] - lut[imgPtr[xmap[x-1]]]); 346 dbuf[width + x] = (float)(lut[nextPtr[x1]] - lut[prevPtr[x1]]); 347 #endif 348 } 349 } 350 //當cn==3時,也就是輸入圖像為3通道圖像時的處理。 351 else 352 { 353 for( x = 0; x < width; x++ ) 354 { 355 //x1表示第y行第x1列的地址 356 int x1 = xmap[x]*3; 357 float dx0, dy0, dx, dy, mag0, mag; 358 #ifdef HAVE_IPP 359 const float* p2 = imgPtr + xmap[x+1]*3; 360 const float* p0 = imgPtr + xmap[x-1]*3; 361 362 dx0 = p2[2] - p0[2]; 363 dy0 = nextPtr[x1+2] - prevPtr[x1+2]; 364 mag0 = dx0*dx0 + dy0*dy0; 365 366 dx = p2[1] - p0[1]; 367 dy = nextPtr[x1+1] - prevPtr[x1+1]; 368 mag = dx*dx + dy*dy; 369 370 if( mag0 < mag ) 371 { 372 dx0 = dx; 373 dy0 = dy; 374 mag0 = mag; 375 } 376 377 dx = p2[0] - p0[0]; 378 dy = nextPtr[x1] - prevPtr[x1]; 379 mag = dx*dx + dy*dy; 380 #else 381 //p2為第y行第x+1列的地址 382 //p0為第y行第x-1列的地址 383 const uchar* p2 = imgPtr + xmap[x+1]*3; 384 const uchar* p0 = imgPtr + xmap[x-1]*3; 385 386 //計算第2通道的幅值 387 dx0 = lut[p2[2]] - lut[p0[2]]; 388 dy0 = lut[nextPtr[x1+2]] - lut[prevPtr[x1+2]]; 389 mag0 = dx0*dx0 + dy0*dy0; 390 391 //計算第1通道的幅值 392 dx = lut[p2[1]] - lut[p0[1]]; 393 dy = lut[nextPtr[x1+1]] - lut[prevPtr[x1+1]]; 394 mag = dx*dx + dy*dy; 395 396 //取幅值最大的那個通道 397 if( mag0 < mag ) 398 { 399 dx0 = dx; 400 dy0 = dy; 401 mag0 = mag; 402 } 403 404 //計算第0通道的幅值 405 dx = lut[p2[0]] - lut[p0[0]]; 406 dy = lut[nextPtr[x1]] - lut[prevPtr[x1]]; 407 mag = dx*dx + dy*dy; 408 #endif 409 //取幅值最大的那個通道 410 if( mag0 < mag ) 411 { 412 dx0 = dx; 413 dy0 = dy; 414 mag0 = mag; 415 } 416 417 //最后求出水平和垂直方向上的梯度圖像 418 dbuf[x] = dx0; 419 dbuf[x+width] = dy0; 420 } 421 } 422 #ifdef HAVE_IPP 423 ippsCartToPolar_32f((const Ipp32f*)Dx.data, (const Ipp32f*)Dy.data, (Ipp32f*)Mag.data, pAngles, width); 424 for( x = 0; x < width; x++ ) 425 { 426 if(pAngles[x] < 0.f) 427 pAngles[x] += (Ipp32f)(CV_PI*2.); 428 } 429 430 ippsNormalize_32f(pAngles, pAngles, width, 0.5f/angleScale, 1.f/angleScale); 431 ippsFloor_32f(pAngles,(Ipp32f*)hidxs.data,width); 432 ippsSub_32f_I((Ipp32f*)hidxs.data,pAngles,width); 433 ippsMul_32f_I((Ipp32f*)Mag.data,pAngles,width); 434 435 ippsSub_32f_I(pAngles,(Ipp32f*)Mag.data,width); 436 ippsRealToCplx_32f((Ipp32f*)Mag.data,pAngles,(Ipp32fc*)gradPtr,width); 437 #else 438 //cartToPolar()函數是計算2個矩陣對應元素的幅度和角度,最后一個參數為是否 439 //角度使用度數表示,這里為false表示不用度數表示,即用弧度表示。 440 //如果只需計算2個矩陣對應元素的幅度圖像,可以采用magnitude()函數。 441 //-pi/2<Angle<pi/2; 442 cartToPolar( Dx, Dy, Mag, Angle, false ); 443 #endif 444 for( x = 0; x < width; x++ ) 445 { 446 #ifdef HAVE_IPP 447 int hidx = (int)pHidxs[x]; 448 #else 449 //-5<angle<4 450 float mag = dbuf[x+width*2], angle = dbuf[x+width*3]*angleScale - 0.5f; 451 //cvFloor()返回不大於參數的最大整數 452 //hidx={-5,-4,-3,-2,-1,0,1,2,3,4}; 453 int hidx = cvFloor(angle); 454 //0<=angle<1;angle表示的意思是與其相鄰的較小的那個bin的弧度距離(即弧度差) 455 angle -= hidx; 456 //gradPtr為grad圖像的指針 457 //gradPtr[x*2]表示的是與x處梯度方向相鄰較小的那個bin的幅度權重; 458 //gradPtr[x*2+1]表示的是與x處梯度方向相鄰較大的那個bin的幅度權重 459 gradPtr[x*2] = mag*(1.f - angle); 460 gradPtr[x*2+1] = mag*angle; 461 #endif 462 if( hidx < 0 ) 463 hidx += _nbins; 464 else if( hidx >= _nbins ) 465 hidx -= _nbins; 466 assert( (unsigned)hidx < (unsigned)_nbins ); 467 468 qanglePtr[x*2] = (uchar)hidx; 469 hidx++; 470 //-1在補碼中的表示為11111111,與-1相與的話就是自己本身了; 471 //0在補碼中的表示為00000000,與0相與的結果就是0了. 472 hidx &= hidx < _nbins ? -1 : 0; 473 qanglePtr[x*2+1] = (uchar)hidx; 474 } 475 } 476 } 477 478 479 struct HOGCache 480 { 481 struct BlockData 482 { 483 BlockData() : histOfs(0), imgOffset() {} 484 int histOfs; 485 Point imgOffset; 486 }; 487 488 struct PixData 489 { 490 size_t gradOfs, qangleOfs; 491 int histOfs[4]; 492 float histWeights[4]; 493 float gradWeight; 494 }; 495 496 HOGCache(); 497 HOGCache(const HOGDescriptor* descriptor, 498 const Mat& img, Size paddingTL, Size paddingBR, 499 bool useCache, Size cacheStride); 500 virtual ~HOGCache() {}; 501 virtual void init(const HOGDescriptor* descriptor, 502 const Mat& img, Size paddingTL, Size paddingBR, 503 bool useCache, Size cacheStride); 504 505 Size windowsInImage(Size imageSize, Size winStride) const; 506 Rect getWindow(Size imageSize, Size winStride, int idx) const; 507 508 const float* getBlock(Point pt, float* buf); 509 virtual void normalizeBlockHistogram(float* histogram) const; 510 511 vector<PixData> pixData; 512 vector<BlockData> blockData; 513 514 bool useCache; 515 vector<int> ymaxCached; 516 Size winSize, cacheStride; 517 Size nblocks, ncells; 518 int blockHistogramSize; 519 int count1, count2, count4; 520 Point imgoffset; 521 Mat_<float> blockCache; 522 Mat_<uchar> blockCacheFlags; 523 524 Mat grad, qangle; 525 const HOGDescriptor* descriptor; 526 }; 527 528 //默認的構造函數,不使用cache,塊的直方圖向量大小為0等 529 HOGCache::HOGCache() 530 { 531 useCache = false; 532 blockHistogramSize = count1 = count2 = count4 = 0; 533 descriptor = 0; 534 } 535 536 //帶參的初始化函數,采用內部的init函數進行初始化 537 HOGCache::HOGCache(const HOGDescriptor* _descriptor, 538 const Mat& _img, Size _paddingTL, Size _paddingBR, 539 bool _useCache, Size _cacheStride) 540 { 541 init(_descriptor, _img, _paddingTL, _paddingBR, _useCache, _cacheStride); 542 } 543 544 //HOGCache結構體的初始化函數 545 void HOGCache::init(const HOGDescriptor* _descriptor, 546 const Mat& _img, Size _paddingTL, Size _paddingBR, 547 bool _useCache, Size _cacheStride) 548 { 549 descriptor = _descriptor; 550 cacheStride = _cacheStride; 551 useCache = _useCache; 552 553 //首先調用computeGradient()函數計算輸入圖像的權值梯度幅度圖和角度量化圖 554 descriptor->computeGradient(_img, grad, qangle, _paddingTL, _paddingBR); 555 //imgoffset是Point類型,而_paddingTL是Size類型,雖然類型不同,但是2者都是 556 //一個二維坐標,所以是在opencv中是允許直接賦值的。 557 imgoffset = _paddingTL; 558 559 winSize = descriptor->winSize; 560 Size blockSize = descriptor->blockSize; 561 Size blockStride = descriptor->blockStride; 562 Size cellSize = descriptor->cellSize; 563 int i, j, nbins = descriptor->nbins; 564 //rawBlockSize為block中包含像素點的個數 565 int rawBlockSize = blockSize.width*blockSize.height; 566 567 //nblocks為Size類型,其長和寬分別表示一個窗口中水平方向和垂直方向上block的 568 //個數(需要考慮block在窗口中的移動) 569 nblocks = Size((winSize.width - blockSize.width)/blockStride.width + 1, 570 (winSize.height - blockSize.height)/blockStride.height + 1); 571 //ncells也是Size類型,其長和寬分別表示一個block中水平方向和垂直方向容納下 572 //的cell個數 573 ncells = Size(blockSize.width/cellSize.width, blockSize.height/cellSize.height); 574 //blockHistogramSize表示一個block中貢獻給hog描述子向量的長度 575 blockHistogramSize = ncells.width*ncells.height*nbins; 576 577 if( useCache ) 578 { 579 //cacheStride= _cacheStride,即其大小是由參數傳入的,表示的是窗口移動的大小 580 //cacheSize長和寬表示擴充后的圖像cache中,block在水平方向和垂直方向出現的個數 581 Size cacheSize((grad.cols - blockSize.width)/cacheStride.width+1, 582 (winSize.height/cacheStride.height)+1); 583 //blockCache為一個float型的Mat,注意其列數的值 584 blockCache.create(cacheSize.height, cacheSize.width*blockHistogramSize); 585 //blockCacheFlags為一個uchar型的Mat 586 blockCacheFlags.create(cacheSize); 587 size_t cacheRows = blockCache.rows; 588 //ymaxCached為vector<int>類型 589 //Mat::resize()為矩陣的一個方法,只是改變矩陣的行數,與單獨的resize()函數不相同。 590 ymaxCached.resize(cacheRows); 591 //ymaxCached向量內部全部初始化為-1 592 for(size_t ii = 0; ii < cacheRows; ii++ ) 593 ymaxCached[ii] = -1; 594 } 595 596 //weights為一個尺寸為blockSize的二維高斯表,下面的代碼就是計算二維高斯的系數 597 Mat_<float> weights(blockSize); 598 float sigma = (float)descriptor->getWinSigma(); 599 float scale = 1.f/(sigma*sigma*2); 600 601 for(i = 0; i < blockSize.height; i++) 602 for(j = 0; j < blockSize.width; j++) 603 { 604 float di = i - blockSize.height*0.5f; 605 float dj = j - blockSize.width*0.5f; 606 weights(i,j) = std::exp(-(di*di + dj*dj)*scale); 607 } 608 609 //vector<BlockData> blockData;而BlockData為HOGCache的一個結構體成員 610 //nblocks.width*nblocks.height表示一個檢測窗口中block的個數, 611 //而cacheSize.width*cacheSize.heigh表示一個已經擴充的圖片中的block的個數 612 blockData.resize(nblocks.width*nblocks.height); 613 //vector<PixData> pixData;同理,Pixdata也為HOGCache中的一個結構體成員 614 //rawBlockSize表示每個block中像素點的個數 615 //resize表示將其轉換成列向量 616 pixData.resize(rawBlockSize*3); 617 618 // Initialize 2 lookup tables, pixData & blockData. 619 // Here is why: 620 // 621 // The detection algorithm runs in 4 nested loops (at each pyramid layer): 622 // loop over the windows within the input image 623 // loop over the blocks within each window 624 // loop over the cells within each block 625 // loop over the pixels in each cell 626 // 627 // As each of the loops runs over a 2-dimensional array, 628 // we could get 8(!) nested loops in total, which is very-very slow. 629 // 630 // To speed the things up, we do the following: 631 // 1. loop over windows is unrolled in the HOGDescriptor::{compute|detect} methods; 632 // inside we compute the current search window using getWindow() method. 633 // Yes, it involves some overhead (function call + couple of divisions), 634 // but it's tiny in fact. 635 // 2. loop over the blocks is also unrolled. Inside we use pre-computed blockData[j] 636 // to set up gradient and histogram pointers. 637 // 3. loops over cells and pixels in each cell are merged 638 // (since there is no overlap between cells, each pixel in the block is processed once) 639 // and also unrolled. Inside we use PixData[k] to access the gradient values and 640 // update the histogram 641 //count1,count2,count4分別表示block中同時對1個cell,2個cell,4個cell有貢獻的像素點的個數。 642 count1 = count2 = count4 = 0; 643 for( j = 0; j < blockSize.width; j++ ) 644 for( i = 0; i < blockSize.height; i++ ) 645 { 646 PixData* data = 0; 647 //cellX和cellY表示的是block內該像素點所在的cell橫坐標和縱坐標索引,以小數的形式存在。 648 float cellX = (j+0.5f)/cellSize.width - 0.5f; 649 float cellY = (i+0.5f)/cellSize.height - 0.5f; 650 //cvRound返回最接近參數的整數;cvFloor返回不大於參數的整數;cvCeil返回不小於參數的整數 651 //icellX0和icellY0表示所在cell坐標索引,索引值為該像素點相鄰cell的那個較小的cell索引 652 //當然此處就是由整數的形式存在了。 653 //按照默認的系數的話,icellX0和icellY0只可能取值-1,0,1,且當i和j<3.5時對應的值才取-1 654 //當i和j>11.5時取值為1,其它時刻取值為0(注意i,j最大是15,從0開始的) 655 int icellX0 = cvFloor(cellX); 656 int icellY0 = cvFloor(cellY); 657 int icellX1 = icellX0 + 1, icellY1 = icellY0 + 1; 658 //此處的cellx和celly表示的是真實索引值與最近鄰cell索引值之間的差, 659 //為后面計算同一像素對不同cell中的hist權重的計算。 660 cellX -= icellX0; 661 cellY -= icellY0; 662 663 //滿足這個if條件說明icellX0只能為0,也就是說block橫坐標在(3.5,11.5)之間時 664 if( (unsigned)icellX0 < (unsigned)ncells.width && 665 (unsigned)icellX1 < (unsigned)ncells.width ) 666 { 667 //滿足這個if條件說明icellY0只能為0,也就是說block縱坐標在(3.5,11.5)之間時 668 if( (unsigned)icellY0 < (unsigned)ncells.height && 669 (unsigned)icellY1 < (unsigned)ncells.height ) 670 { 671 //同時滿足上面2個if語句的像素對4個cell都有權值貢獻 672 //rawBlockSize表示的是1個block中存儲像素點的個數 673 //而pixData的尺寸大小為block中像素點的3倍,其定義如下: 674 //pixData.resize(rawBlockSize*3); 675 //pixData的前面block像素大小的內存為存儲只對block中一個cell 676 //有貢獻的pixel;中間block像素大小的內存存儲對block中同時2個 677 //cell有貢獻的pixel;最后面的為對block中同時4個cell都有貢獻 678 //的pixel 679 data = &pixData[rawBlockSize*2 + (count4++)]; 680 //下面計算出的結果為0 681 data->histOfs[0] = (icellX0*ncells.height + icellY0)*nbins; 682 //為該像素點對cell0的權重 683 data->histWeights[0] = (1.f - cellX)*(1.f - cellY); 684 //下面計算出的結果為18 685 data->histOfs[1] = (icellX1*ncells.height + icellY0)*nbins; 686 data->histWeights[1] = cellX*(1.f - cellY); 687 //下面計算出的結果為9 688 data->histOfs[2] = (icellX0*ncells.height + icellY1)*nbins; 689 data->histWeights[2] = (1.f - cellX)*cellY; 690 //下面計算出的結果為27 691 data->histOfs[3] = (icellX1*ncells.height + icellY1)*nbins; 692 data->histWeights[3] = cellX*cellY; 693 } 694 else 695 //滿足這個else條件說明icellY0取-1或者1,也就是說block縱坐標在(0, 3.5) 696 //和(11.5, 15)之間. 697 //此時的像素點對相鄰的2個cell有權重貢獻 698 { 699 data = &pixData[rawBlockSize + (count2++)]; 700 if( (unsigned)icellY0 < (unsigned)ncells.height ) 701 { 702 //(unsigned)-1等於127>2,所以此處滿足if條件時icellY0==1; 703 //icellY1==1; 704 icellY1 = icellY0; 705 cellY = 1.f - cellY; 706 } 707 //不滿足if條件時,icellY0==-1;icellY1==0; 708 //當然了,這2種情況下icellX0==0;icellX1==1; 709 data->histOfs[0] = (icellX0*ncells.height + icellY1)*nbins; 710 data->histWeights[0] = (1.f - cellX)*cellY; 711 data->histOfs[1] = (icellX1*ncells.height + icellY1)*nbins; 712 data->histWeights[1] = cellX*cellY; 713 data->histOfs[2] = data->histOfs[3] = 0; 714 data->histWeights[2] = data->histWeights[3] = 0; 715 } 716 } 717 //當block中橫坐標滿足在(0, 3.5)和(11.5, 15)范圍內時,即 718 //icellX0==-1或==1 719 else 720 { 721 722 if( (unsigned)icellX0 < (unsigned)ncells.width ) 723 { 724 //icellX1=icllX0=1; 725 icellX1 = icellX0; 726 cellX = 1.f - cellX; 727 } 728 //當icllY0=0時,此時對2個cell有貢獻 729 if( (unsigned)icellY0 < (unsigned)ncells.height && 730 (unsigned)icellY1 < (unsigned)ncells.height ) 731 { 732 data = &pixData[rawBlockSize + (count2++)]; 733 data->histOfs[0] = (icellX1*ncells.height + icellY0)*nbins; 734 data->histWeights[0] = cellX*(1.f - cellY); 735 data->histOfs[1] = (icellX1*ncells.height + icellY1)*nbins; 736 data->histWeights[1] = cellX*cellY; 737 data->histOfs[2] = data->histOfs[3] = 0; 738 data->histWeights[2] = data->histWeights[3] = 0; 739 } 740 else 741 //此時只對自身的cell有貢獻 742 { 743 data = &pixData[count1++]; 744 if( (unsigned)icellY0 < (unsigned)ncells.height ) 745 { 746 icellY1 = icellY0; 747 cellY = 1.f - cellY; 748 } 749 data->histOfs[0] = (icellX1*ncells.height + icellY1)*nbins; 750 data->histWeights[0] = cellX*cellY; 751 data->histOfs[1] = data->histOfs[2] = data->histOfs[3] = 0; 752 data->histWeights[1] = data->histWeights[2] = data->histWeights[3] = 0; 753 } 754 } 755 //為什么每個block中i,j位置的gradOfs和qangleOfs都相同且是如下的計算公式呢? 756 //那是因為輸入的_img參數不是代表整幅圖片而是檢測窗口大小的圖片,所以每個 757 //檢測窗口中關於block的信息可以看做是相同的 758 data->gradOfs = (grad.cols*i + j)*2; 759 data->qangleOfs = (qangle.cols*i + j)*2; 760 //每個block中i,j位置的權重都是固定的 761 data->gradWeight = weights(i,j); 762 } 763 764 //保證所有的點都被掃描了一遍 765 assert( count1 + count2 + count4 == rawBlockSize ); 766 // defragment pixData 767 //將pixData中按照內存排滿,這樣節省了2/3的內存 768 for( j = 0; j < count2; j++ ) 769 pixData[j + count1] = pixData[j + rawBlockSize]; 770 for( j = 0; j < count4; j++ ) 771 pixData[j + count1 + count2] = pixData[j + rawBlockSize*2]; 772 //此時count2表示至多對2個cell有貢獻的所有像素點的個數 773 count2 += count1; 774 //此時count4表示至多對4個cell有貢獻的所有像素點的個數 775 count4 += count2; 776 777 //上面是初始化pixData,下面開始初始化blockData 778 // initialize blockData 779 for( j = 0; j < nblocks.width; j++ ) 780 for( i = 0; i < nblocks.height; i++ ) 781 { 782 BlockData& data = blockData[j*nblocks.height + i]; 783 //histOfs表示該block對檢測窗口貢獻的hog描述變量起點在整個 784 //變量中的坐標 785 data.histOfs = (j*nblocks.height + i)*blockHistogramSize; 786 //imgOffset表示該block的左上角在檢測窗口中的坐標 787 data.imgOffset = Point(j*blockStride.width,i*blockStride.height); 788 } 789 //一個檢測窗口對應一個blockData內存,一個block對應一個pixData內存。 790 } 791 792 793 //pt為該block左上角在滑動窗口中的坐標,buf為指向檢測窗口中blocData的指針 794 //函數返回一個block描述子的指針 795 const float* HOGCache::getBlock(Point pt, float* buf) 796 { 797 float* blockHist = buf; 798 assert(descriptor != 0); 799 800 Size blockSize = descriptor->blockSize; 801 pt += imgoffset; 802 803 CV_Assert( (unsigned)pt.x <= (unsigned)(grad.cols - blockSize.width) && 804 (unsigned)pt.y <= (unsigned)(grad.rows - blockSize.height) ); 805 806 if( useCache ) 807 { 808 //cacheStride可以認為和blockStride是一樣的 809 //保證所獲取到HOGCache是我們所需要的,即在block移動過程中會出現 810 CV_Assert( pt.x % cacheStride.width == 0 && 811 pt.y % cacheStride.height == 0 ); 812 //cacheIdx表示的是block個數的坐標 813 Point cacheIdx(pt.x/cacheStride.width, 814 (pt.y/cacheStride.height) % blockCache.rows); 815 //ymaxCached的長度為一個檢測窗口垂直方向上容納的block個數 816 if( pt.y != ymaxCached[cacheIdx.y] ) 817 { 818 //取出blockCacheFlags的第cacheIdx.y行並且賦值為0 819 Mat_<uchar> cacheRow = blockCacheFlags.row(cacheIdx.y); 820 cacheRow = (uchar)0; 821 ymaxCached[cacheIdx.y] = pt.y; 822 } 823 824 //blockHist指向該點對應block所貢獻的hog描述子向量,初始值為空 825 blockHist = &blockCache[cacheIdx.y][cacheIdx.x*blockHistogramSize]; 826 uchar& computedFlag = blockCacheFlags(cacheIdx.y, cacheIdx.x); 827 if( computedFlag != 0 ) 828 return blockHist; 829 computedFlag = (uchar)1; // set it at once, before actual computing 830 } 831 832 int k, C1 = count1, C2 = count2, C4 = count4; 833 // 834 const float* gradPtr = (const float*)(grad.data + grad.step*pt.y) + pt.x*2; 835 const uchar* qanglePtr = qangle.data + qangle.step*pt.y + pt.x*2; 836 837 CV_Assert( blockHist != 0 ); 838 #ifdef HAVE_IPP 839 ippsZero_32f(blockHist,blockHistogramSize); 840 #else 841 for( k = 0; k < blockHistogramSize; k++ ) 842 blockHist[k] = 0.f; 843 #endif 844 845 const PixData* _pixData = &pixData[0]; 846 847 //C1表示只對自己所在cell有貢獻的點的個數 848 for( k = 0; k < C1; k++ ) 849 { 850 const PixData& pk = _pixData[k]; 851 //a表示的是幅度指針 852 const float* a = gradPtr + pk.gradOfs; 853 float w = pk.gradWeight*pk.histWeights[0]; 854 //h表示的是相位指針 855 const uchar* h = qanglePtr + pk.qangleOfs; 856 857 //幅度有2個通道是因為每個像素點的幅值被分解到了其相鄰的兩個bin上了 858 //相位有2個通道是因為每個像素點的相位的相鄰處都有的2個bin的序號 859 int h0 = h[0], h1 = h[1]; 860 float* hist = blockHist + pk.histOfs[0]; 861 float t0 = hist[h0] + a[0]*w; 862 float t1 = hist[h1] + a[1]*w; 863 //hist中放的為加權的梯度值 864 hist[h0] = t0; hist[h1] = t1; 865 } 866 867 for( ; k < C2; k++ ) 868 { 869 const PixData& pk = _pixData[k]; 870 const float* a = gradPtr + pk.gradOfs; 871 float w, t0, t1, a0 = a[0], a1 = a[1]; 872 const uchar* h = qanglePtr + pk.qangleOfs; 873 int h0 = h[0], h1 = h[1]; 874 875 //因為此時的像素對2個cell有貢獻,這是其中一個cell的貢獻 876 float* hist = blockHist + pk.histOfs[0]; 877 w = pk.gradWeight*pk.histWeights[0]; 878 t0 = hist[h0] + a0*w; 879 t1 = hist[h1] + a1*w; 880 hist[h0] = t0; hist[h1] = t1; 881 882 //另一個cell的貢獻 883 hist = blockHist + pk.histOfs[1]; 884 w = pk.gradWeight*pk.histWeights[1]; 885 t0 = hist[h0] + a0*w; 886 t1 = hist[h1] + a1*w; 887 hist[h0] = t0; hist[h1] = t1; 888 } 889 890 //和上面類似 891 for( ; k < C4; k++ ) 892 { 893 const PixData& pk = _pixData[k]; 894 const float* a = gradPtr + pk.gradOfs; 895 float w, t0, t1, a0 = a[0], a1 = a[1]; 896 const uchar* h = qanglePtr + pk.qangleOfs; 897 int h0 = h[0], h1 = h[1]; 898 899 float* hist = blockHist + pk.histOfs[0]; 900 w = pk.gradWeight*pk.histWeights[0]; 901 t0 = hist[h0] + a0*w; 902 t1 = hist[h1] + a1*w; 903 hist[h0] = t0; hist[h1] = t1; 904 905 hist = blockHist + pk.histOfs[1]; 906 w = pk.gradWeight*pk.histWeights[1]; 907 t0 = hist[h0] + a0*w; 908 t1 = hist[h1] + a1*w; 909 hist[h0] = t0; hist[h1] = t1; 910 911 hist = blockHist + pk.histOfs[2]; 912 w = pk.gradWeight*pk.histWeights[2]; 913 t0 = hist[h0] + a0*w; 914 t1 = hist[h1] + a1*w; 915 hist[h0] = t0; hist[h1] = t1; 916 917 hist = blockHist + pk.histOfs[3]; 918 w = pk.gradWeight*pk.histWeights[3]; 919 t0 = hist[h0] + a0*w; 920 t1 = hist[h1] + a1*w; 921 hist[h0] = t0; hist[h1] = t1; 922 } 923 924 normalizeBlockHistogram(blockHist); 925 926 return blockHist; 927 } 928 929 930 void HOGCache::normalizeBlockHistogram(float* _hist) const 931 { 932 float* hist = &_hist[0]; 933 #ifdef HAVE_IPP 934 size_t sz = blockHistogramSize; 935 #else 936 size_t i, sz = blockHistogramSize; 937 #endif 938 939 float sum = 0; 940 #ifdef HAVE_IPP 941 ippsDotProd_32f(hist,hist,sz,&sum); 942 #else 943 //第一次歸一化求的是平方和 944 for( i = 0; i < sz; i++ ) 945 sum += hist[i]*hist[i]; 946 #endif 947 //分母為平方和開根號+0.1 948 float scale = 1.f/(std::sqrt(sum)+sz*0.1f), thresh = (float)descriptor->L2HysThreshold; 949 #ifdef HAVE_IPP 950 ippsMulC_32f_I(scale,hist,sz); 951 ippsThreshold_32f_I( hist, sz, thresh, ippCmpGreater ); 952 ippsDotProd_32f(hist,hist,sz,&sum); 953 #else 954 for( i = 0, sum = 0; i < sz; i++ ) 955 { 956 //第2次歸一化是在第1次的基礎上繼續求平和和 957 hist[i] = std::min(hist[i]*scale, thresh); 958 sum += hist[i]*hist[i]; 959 } 960 #endif 961 962 scale = 1.f/(std::sqrt(sum)+1e-3f); 963 #ifdef HAVE_IPP 964 ippsMulC_32f_I(scale,hist,sz); 965 #else 966 //最終歸一化結果 967 for( i = 0; i < sz; i++ ) 968 hist[i] *= scale; 969 #endif 970 } 971 972 973 //返回測試圖片中水平方向和垂直方向共有多少個檢測窗口 974 Size HOGCache::windowsInImage(Size imageSize, Size winStride) const 975 { 976 return Size((imageSize.width - winSize.width)/winStride.width + 1, 977 (imageSize.height - winSize.height)/winStride.height + 1); 978 } 979 980 981 //給定圖片的大小,已經檢測窗口滑動的大小和測試圖片中的檢測窗口的索引,得到該索引處 982 //檢測窗口的尺寸,包括坐標信息 983 Rect HOGCache::getWindow(Size imageSize, Size winStride, int idx) const 984 { 985 int nwindowsX = (imageSize.width - winSize.width)/winStride.width + 1; 986 int y = idx / nwindowsX;//商 987 int x = idx - nwindowsX*y;//余數 988 return Rect( x*winStride.width, y*winStride.height, winSize.width, winSize.height ); 989 } 990 991 992 void HOGDescriptor::compute(const Mat& img, vector<float>& descriptors, 993 Size winStride, Size padding, 994 const vector<Point>& locations) const 995 { 996 //Size()表示長和寬都是0 997 if( winStride == Size() ) 998 winStride = cellSize; 999 //gcd為求最大公約數,如果采用默認值的話,則2者相同 1000 Size cacheStride(gcd(winStride.width, blockStride.width), 1001 gcd(winStride.height, blockStride.height)); 1002 size_t nwindows = locations.size(); 1003 //alignSize(m, n)返回n的倍數大於等於m的最小值 1004 padding.width = (int)alignSize(std::max(padding.width, 0), cacheStride.width); 1005 padding.height = (int)alignSize(std::max(padding.height, 0), cacheStride.height); 1006 Size paddedImgSize(img.cols + padding.width*2, img.rows + padding.height*2); 1007 1008 HOGCache cache(this, img, padding, padding, nwindows == 0, cacheStride); 1009 1010 if( !nwindows ) 1011 //Mat::area()表示為Mat的面積 1012 nwindows = cache.windowsInImage(paddedImgSize, winStride).area(); 1013 1014 const HOGCache::BlockData* blockData = &cache.blockData[0]; 1015 1016 int nblocks = cache.nblocks.area(); 1017 int blockHistogramSize = cache.blockHistogramSize; 1018 size_t dsize = getDescriptorSize();//一個hog的描述長度 1019 //resize()為改變矩陣的行數,如果減少矩陣的行數則只保留減少后的 1020 //那些行,如果是增加行數,則保留所有的行。 1021 //這里將描述子長度擴展到整幅圖片 1022 descriptors.resize(dsize*nwindows); 1023 1024 for( size_t i = 0; i < nwindows; i++ ) 1025 { 1026 //descriptor為第i個檢測窗口的描述子首位置。 1027 float* descriptor = &descriptors[i*dsize]; 1028 1029 Point pt0; 1030 //非空 1031 if( !locations.empty() ) 1032 { 1033 pt0 = locations[i]; 1034 //非法的點 1035 if( pt0.x < -padding.width || pt0.x > img.cols + padding.width - winSize.width || 1036 pt0.y < -padding.height || pt0.y > img.rows + padding.height - winSize.height ) 1037 continue; 1038 } 1039 //locations為空 1040 else 1041 { 1042 //pt0為沒有擴充前圖像對應的第i個檢測窗口 1043 pt0 = cache.getWindow(paddedImgSize, winStride, (int)i).tl() - Point(padding); 1044 CV_Assert(pt0.x % cacheStride.width == 0 && pt0.y % cacheStride.height == 0); 1045 } 1046 1047 for( int j = 0; j < nblocks; j++ ) 1048 { 1049 const HOGCache::BlockData& bj = blockData[j]; 1050 //pt為block的左上角相對檢測圖片的坐標 1051 Point pt = pt0 + bj.imgOffset; 1052 1053 //dst為該block在整個測試圖片的描述子的位置 1054 float* dst = descriptor + bj.histOfs; 1055 const float* src = cache.getBlock(pt, dst); 1056 if( src != dst ) 1057 #ifdef HAVE_IPP 1058 ippsCopy_32f(src,dst,blockHistogramSize); 1059 #else 1060 for( int k = 0; k < blockHistogramSize; k++ ) 1061 dst[k] = src[k]; 1062 #endif 1063 } 1064 } 1065 } 1066 1067 1068 void HOGDescriptor::detect(const Mat& img, 1069 vector<Point>& hits, vector<double>& weights, double hitThreshold, 1070 Size winStride, Size padding, const vector<Point>& locations) const 1071 { 1072 //hits里面存的是符合檢測到目標的窗口的左上角頂點坐標 1073 hits.clear(); 1074 if( svmDetector.empty() ) 1075 return; 1076 1077 if( winStride == Size() ) 1078 winStride = cellSize; 1079 Size cacheStride(gcd(winStride.width, blockStride.width), 1080 gcd(winStride.height, blockStride.height)); 1081 size_t nwindows = locations.size(); 1082 padding.width = (int)alignSize(std::max(padding.width, 0), cacheStride.width); 1083 padding.height = (int)alignSize(std::max(padding.height, 0), cacheStride.height); 1084 Size paddedImgSize(img.cols + padding.width*2, img.rows + padding.height*2); 1085 1086 HOGCache cache(this, img, padding, padding, nwindows == 0, cacheStride); 1087 1088 if( !nwindows ) 1089 nwindows = cache.windowsInImage(paddedImgSize, winStride).area(); 1090 1091 const HOGCache::BlockData* blockData = &cache.blockData[0]; 1092 1093 int nblocks = cache.nblocks.area(); 1094 int blockHistogramSize = cache.blockHistogramSize; 1095 size_t dsize = getDescriptorSize(); 1096 1097 double rho = svmDetector.size() > dsize ? svmDetector[dsize] : 0; 1098 vector<float> blockHist(blockHistogramSize); 1099 1100 for( size_t i = 0; i < nwindows; i++ ) 1101 { 1102 Point pt0; 1103 if( !locations.empty() ) 1104 { 1105 pt0 = locations[i]; 1106 if( pt0.x < -padding.width || pt0.x > img.cols + padding.width - winSize.width || 1107 pt0.y < -padding.height || pt0.y > img.rows + padding.height - winSize.height ) 1108 continue; 1109 } 1110 else 1111 { 1112 pt0 = cache.getWindow(paddedImgSize, winStride, (int)i).tl() - Point(padding); 1113 CV_Assert(pt0.x % cacheStride.width == 0 && pt0.y % cacheStride.height == 0); 1114 } 1115 double s = rho; 1116 //svmVec指向svmDetector最前面那個元素 1117 const float* svmVec = &svmDetector[0]; 1118 #ifdef HAVE_IPP 1119 int j; 1120 #else 1121 int j, k; 1122 #endif 1123 for( j = 0; j < nblocks; j++, svmVec += blockHistogramSize ) 1124 { 1125 const HOGCache::BlockData& bj = blockData[j]; 1126 Point pt = pt0 + bj.imgOffset; 1127 1128 //vec為測試圖片pt處的block貢獻的描述子指針 1129 const float* vec = cache.getBlock(pt, &blockHist[0]); 1130 #ifdef HAVE_IPP 1131 Ipp32f partSum; 1132 ippsDotProd_32f(vec,svmVec,blockHistogramSize,&partSum); 1133 s += (double)partSum; 1134 #else 1135 for( k = 0; k <= blockHistogramSize - 4; k += 4 ) 1136 //const float* svmVec = &svmDetector[0]; 1137 s += vec[k]*svmVec[k] + vec[k+1]*svmVec[k+1] + 1138 vec[k+2]*svmVec[k+2] + vec[k+3]*svmVec[k+3]; 1139 for( ; k < blockHistogramSize; k++ ) 1140 s += vec[k]*svmVec[k]; 1141 #endif 1142 } 1143 if( s >= hitThreshold ) 1144 { 1145 hits.push_back(pt0); 1146 weights.push_back(s); 1147 } 1148 } 1149 } 1150 1151 //不用保留檢測到目標的可信度,即權重 1152 void HOGDescriptor::detect(const Mat& img, vector<Point>& hits, double hitThreshold, 1153 Size winStride, Size padding, const vector<Point>& locations) const 1154 { 1155 vector<double> weightsV; 1156 detect(img, hits, weightsV, hitThreshold, winStride, padding, locations); 1157 } 1158 1159 struct HOGInvoker 1160 { 1161 HOGInvoker( const HOGDescriptor* _hog, const Mat& _img, 1162 double _hitThreshold, Size _winStride, Size _padding, 1163 const double* _levelScale, ConcurrentRectVector* _vec, 1164 ConcurrentDoubleVector* _weights=0, ConcurrentDoubleVector* _scales=0 ) 1165 { 1166 hog = _hog; 1167 img = _img; 1168 hitThreshold = _hitThreshold; 1169 winStride = _winStride; 1170 padding = _padding; 1171 levelScale = _levelScale; 1172 vec = _vec; 1173 weights = _weights; 1174 scales = _scales; 1175 } 1176 1177 void operator()( const BlockedRange& range ) const 1178 { 1179 int i, i1 = range.begin(), i2 = range.end(); 1180 double minScale = i1 > 0 ? levelScale[i1] : i2 > 1 ? levelScale[i1+1] : std::max(img.cols, img.rows); 1181 //將原圖片進行縮放 1182 Size maxSz(cvCeil(img.cols/minScale), cvCeil(img.rows/minScale)); 1183 Mat smallerImgBuf(maxSz, img.type()); 1184 vector<Point> locations; 1185 vector<double> hitsWeights; 1186 1187 for( i = i1; i < i2; i++ ) 1188 { 1189 double scale = levelScale[i]; 1190 Size sz(cvRound(img.cols/scale), cvRound(img.rows/scale)); 1191 //smallerImg只是構造一個指針,並沒有復制數據 1192 Mat smallerImg(sz, img.type(), smallerImgBuf.data); 1193 //沒有尺寸縮放 1194 if( sz == img.size() ) 1195 smallerImg = Mat(sz, img.type(), img.data, img.step); 1196 //有尺寸縮放 1197 else 1198 resize(img, smallerImg, sz); 1199 //該函數實際上是將返回的值存在locations和histWeights中 1200 //其中locations存的是目標區域的左上角坐標 1201 hog->detect(smallerImg, locations, hitsWeights, hitThreshold, winStride, padding); 1202 Size scaledWinSize = Size(cvRound(hog->winSize.width*scale), cvRound(hog->winSize.height*scale)); 1203 for( size_t j = 0; j < locations.size(); j++ ) 1204 { 1205 //保存目標區域 1206 vec->push_back(Rect(cvRound(locations[j].x*scale), 1207 cvRound(locations[j].y*scale), 1208 scaledWinSize.width, scaledWinSize.height)); 1209 //保存縮放尺寸 1210 if (scales) { 1211 scales->push_back(scale); 1212 } 1213 } 1214 //保存svm計算后的結果值 1215 if (weights && (!hitsWeights.empty())) 1216 { 1217 for (size_t j = 0; j < locations.size(); j++) 1218 { 1219 weights->push_back(hitsWeights[j]); 1220 } 1221 } 1222 } 1223 } 1224 1225 const HOGDescriptor* hog; 1226 Mat img; 1227 double hitThreshold; 1228 Size winStride; 1229 Size padding; 1230 const double* levelScale; 1231 //typedef tbb::concurrent_vector<Rect> ConcurrentRectVector; 1232 ConcurrentRectVector* vec; 1233 //typedef tbb::concurrent_vector<double> ConcurrentDoubleVector; 1234 ConcurrentDoubleVector* weights; 1235 ConcurrentDoubleVector* scales; 1236 }; 1237 1238 1239 void HOGDescriptor::detectMultiScale( 1240 const Mat& img, vector<Rect>& foundLocations, vector<double>& foundWeights, 1241 double hitThreshold, Size winStride, Size padding, 1242 double scale0, double finalThreshold, bool useMeanshiftGrouping) const 1243 { 1244 double scale = 1.; 1245 int levels = 0; 1246 1247 vector<double> levelScale; 1248 1249 //nlevels默認的是64層 1250 for( levels = 0; levels < nlevels; levels++ ) 1251 { 1252 levelScale.push_back(scale); 1253 if( cvRound(img.cols/scale) < winSize.width || 1254 cvRound(img.rows/scale) < winSize.height || 1255 scale0 <= 1 ) 1256 break; 1257 //只考慮測試圖片尺寸比檢測窗口尺寸大的情況 1258 scale *= scale0; 1259 } 1260 levels = std::max(levels, 1); 1261 levelScale.resize(levels); 1262 1263 ConcurrentRectVector allCandidates; 1264 ConcurrentDoubleVector tempScales; 1265 ConcurrentDoubleVector tempWeights; 1266 vector<double> foundScales; 1267 1268 //TBB並行計算 1269 parallel_for(BlockedRange(0, (int)levelScale.size()), 1270 HOGInvoker(this, img, hitThreshold, winStride, padding, &levelScale[0], &allCandidates, &tempWeights, &tempScales)); 1271 //將tempScales中的內容復制到foundScales中;back_inserter是指在指定參數迭代器的末尾插入數據 1272 std::copy(tempScales.begin(), tempScales.end(), back_inserter(foundScales)); 1273 //容器的clear()方法是指移除容器中所有的數據 1274 foundLocations.clear(); 1275 //將候選目標窗口保存在foundLocations中 1276 std::copy(allCandidates.begin(), allCandidates.end(), back_inserter(foundLocations)); 1277 foundWeights.clear(); 1278 //將候選目標可信度保存在foundWeights中 1279 std::copy(tempWeights.begin(), tempWeights.end(), back_inserter(foundWeights)); 1280 1281 if ( useMeanshiftGrouping ) 1282 { 1283 groupRectangles_meanshift(foundLocations, foundWeights, foundScales, finalThreshold, winSize); 1284 } 1285 else 1286 { 1287 //對矩形框進行聚類 1288 groupRectangles(foundLocations, (int)finalThreshold, 0.2); 1289 } 1290 } 1291 1292 //不考慮目標的置信度 1293 void HOGDescriptor::detectMultiScale(const Mat& img, vector<Rect>& foundLocations, 1294 double hitThreshold, Size winStride, Size padding, 1295 double scale0, double finalThreshold, bool useMeanshiftGrouping) const 1296 { 1297 vector<double> foundWeights; 1298 detectMultiScale(img, foundLocations, foundWeights, hitThreshold, winStride, 1299 padding, scale0, finalThreshold, useMeanshiftGrouping); 1300 } 1301 1302 typedef RTTIImpl<HOGDescriptor> HOGRTTI; 1303 1304 CvType hog_type( CV_TYPE_NAME_HOG_DESCRIPTOR, HOGRTTI::isInstance, 1305 HOGRTTI::release, HOGRTTI::read, HOGRTTI::write, HOGRTTI::clone); 1306 1307 vector<float> HOGDescriptor::getDefaultPeopleDetector() 1308 { 1309 static const float detector[] = { 1310 0.05359386f, -0.14721455f, -0.05532170f, 0.05077307f, 1311 0.11547081f, -0.04268804f, 0.04635834f, ........ 1312 }; 1313 //返回detector數組的從頭到尾構成的向量 1314 return vector<float>(detector, detector + sizeof(detector)/sizeof(detector[0])); 1315 } 1316 //This function renurn 1981 SVM coeffs obtained from daimler's base. 1317 //To use these coeffs the detection window size should be (48,96) 1318 vector<float> HOGDescriptor::getDaimlerPeopleDetector() 1319 { 1320 static const float detector[] = { 1321 0.294350f, -0.098796f, -0.129522f, 0.078753f, 1322 0.387527f, 0.261529f, 0.145939f, 0.061520f, 1323 ........ 1324 }; 1325 //返回detector的首尾構成的向量 1326 return vector<float>(detector, detector + sizeof(detector)/sizeof(detector[0])); 1327 } 1328 1329 }
objdetect.hpp中關於hog的部分:
1 //////////////// HOG (Histogram-of-Oriented-Gradients) Descriptor and Object Detector ////////////// 2 3 struct CV_EXPORTS_W HOGDescriptor 4 { 5 public: 6 enum { L2Hys=0 }; 7 enum { DEFAULT_NLEVELS=64 }; 8 9 CV_WRAP HOGDescriptor() : winSize(64,128), blockSize(16,16), blockStride(8,8), 10 cellSize(8,8), nbins(9), derivAperture(1), winSigma(-1), 11 histogramNormType(HOGDescriptor::L2Hys), L2HysThreshold(0.2), gammaCorrection(true), 12 nlevels(HOGDescriptor::DEFAULT_NLEVELS) 13 {} 14 15 //可以用構造函數的參數來作為冒號外的參數初始化傳入,這樣定義該類的時候,一旦變量分配了 16 //內存,則馬上會被初始化,而不用等所有變量分配完內存后再初始化。 17 CV_WRAP HOGDescriptor(Size _winSize, Size _blockSize, Size _blockStride, 18 Size _cellSize, int _nbins, int _derivAperture=1, double _winSigma=-1, 19 int _histogramNormType=HOGDescriptor::L2Hys, 20 double _L2HysThreshold=0.2, bool _gammaCorrection=false, 21 int _nlevels=HOGDescriptor::DEFAULT_NLEVELS) 22 : winSize(_winSize), blockSize(_blockSize), blockStride(_blockStride), cellSize(_cellSize), 23 nbins(_nbins), derivAperture(_derivAperture), winSigma(_winSigma), 24 histogramNormType(_histogramNormType), L2HysThreshold(_L2HysThreshold), 25 gammaCorrection(_gammaCorrection), nlevels(_nlevels) 26 {} 27 28 //可以導入文本文件進行初始化 29 CV_WRAP HOGDescriptor(const String& filename) 30 { 31 load(filename); 32 } 33 34 HOGDescriptor(const HOGDescriptor& d) 35 { 36 d.copyTo(*this); 37 } 38 39 virtual ~HOGDescriptor() {} 40 41 //size_t是一個long unsigned int型 42 CV_WRAP size_t getDescriptorSize() const; 43 CV_WRAP bool checkDetectorSize() const; 44 CV_WRAP double getWinSigma() const; 45 46 //virtual為虛函數,在指針或引用時起函數多態作用 47 CV_WRAP virtual void setSVMDetector(InputArray _svmdetector); 48 49 virtual bool read(FileNode& fn); 50 virtual void write(FileStorage& fs, const String& objname) const; 51 52 CV_WRAP virtual bool load(const String& filename, const String& objname=String()); 53 CV_WRAP virtual void save(const String& filename, const String& objname=String()) const; 54 virtual void copyTo(HOGDescriptor& c) const; 55 56 CV_WRAP virtual void compute(const Mat& img, 57 CV_OUT vector<float>& descriptors, 58 Size winStride=Size(), Size padding=Size(), 59 const vector<Point>& locations=vector<Point>()) const; 60 //with found weights output 61 CV_WRAP virtual void detect(const Mat& img, CV_OUT vector<Point>& foundLocations, 62 CV_OUT vector<double>& weights, 63 double hitThreshold=0, Size winStride=Size(), 64 Size padding=Size(), 65 const vector<Point>& searchLocations=vector<Point>()) const; 66 //without found weights output 67 virtual void detect(const Mat& img, CV_OUT vector<Point>& foundLocations, 68 double hitThreshold=0, Size winStride=Size(), 69 Size padding=Size(), 70 const vector<Point>& searchLocations=vector<Point>()) const; 71 //with result weights output 72 CV_WRAP virtual void detectMultiScale(const Mat& img, CV_OUT vector<Rect>& foundLocations, 73 CV_OUT vector<double>& foundWeights, double hitThreshold=0, 74 Size winStride=Size(), Size padding=Size(), double scale=1.05, 75 double finalThreshold=2.0,bool useMeanshiftGrouping = false) const; 76 //without found weights output 77 virtual void detectMultiScale(const Mat& img, CV_OUT vector<Rect>& foundLocations, 78 double hitThreshold=0, Size winStride=Size(), 79 Size padding=Size(), double scale=1.05, 80 double finalThreshold=2.0, bool useMeanshiftGrouping = false) const; 81 82 CV_WRAP virtual void computeGradient(const Mat& img, CV_OUT Mat& grad, CV_OUT Mat& angleOfs, 83 Size paddingTL=Size(), Size paddingBR=Size()) const; 84 85 CV_WRAP static vector<float> getDefaultPeopleDetector(); 86 CV_WRAP static vector<float> getDaimlerPeopleDetector(); 87 88 CV_PROP Size winSize; 89 CV_PROP Size blockSize; 90 CV_PROP Size blockStride; 91 CV_PROP Size cellSize; 92 CV_PROP int nbins; 93 CV_PROP int derivAperture; 94 CV_PROP double winSigma; 95 CV_PROP int histogramNormType; 96 CV_PROP double L2HysThreshold; 97 CV_PROP bool gammaCorrection; 98 CV_PROP vector<float> svmDetector; 99 CV_PROP int nlevels; 100 };
[2] 黃冬麗, 戴健文, 馮超, 等. HOG 特征提取中的三線性插值算法[J]. 電腦知識與技術: 學術交流, 2012, 8(11): 7548-7551.
https://blog.csdn.net/gy429476195/article/details/50156813
https://blog.csdn.net/zhanghenan123/article/details/80853523
https://blog.csdn.net/huguohu2006/article/details/48681287
https://blog.csdn.net/sinat_34604992/article/details/53933004