之前,俺也發過不少快速高斯模糊算法.
俺一般認為,只要處理一千六百萬像素彩色圖片,在2.2GHz的CPU上單核單線程超過1秒的算法,都是不快的.
之前發的幾個算法,在俺2.2GHz的CPU上耗時都會超過1秒.
而眾所周知,快速高斯模糊有很多實現方法:
1.FIR (Finite impulse response)
https://zh.wikipedia.org/wiki/%E9%AB%98%E6%96%AF%E6%A8%A1%E7%B3%8A
2.SII (Stacked integral images)
http://dx.doi.org/10.1109/ROBOT.2010.5509400
http://arxiv.org/abs/1107.4958
3.Vliet-Young-Verbeek (Recursive filter)
http://dx.doi.org/10.1016/0165-1684(95)00020-E
http://dx.doi.org/10.1109/ICPR.1998.711192
4.DCT (Discrete Cosine Transform)
http://dx.doi.org/10.1109/78.295213
5.box (Box filter)
http://dx.doi.org/10.1109/TPAMI.1986.4767776
6.AM(Alvarez, Mazorra)
http://www.jstor.org/stable/2158018
7.Deriche (Recursive filter)
http://hal.inria.fr/docs/00/07/47/78/PDF/RR-1893.pdf
8.ebox (Extended Box)
http://dx.doi.org/10.1007/978-3-642-24785-9_38
9.IIR (Infinite Impulse Response)
https://software.intel.com/zh-cn/articles/iir-gaussian-blur-filter-implementation-using-intel-advanced-vector-extensions
10.FA (Fast Anisotropic)
http://mathinfo.univ-reims.fr/IMG/pdf/Fast_Anisotropic_Gquss_Filtering_-_GeusebroekECCV02.pdf
......
實現高斯模糊的方法雖然很多,但是作為算法而言,核心關鍵是簡單高效.
目前俺經過實測,IIR是兼顧效果以及性能的不錯的方法,也是半徑無關(即模糊不同強度耗時基本不變)的實現.
英特爾官方實現的這份:
IIR Gaussian Blur Filter Implementation using Intel® Advanced Vector Extensions [PDF 513KB]
source: gaussian_blur.cpp [36KB]
采用了英特爾處理器的流(SIMD)指令,算法處理速度極其驚人.
俺寫算法追求干凈整潔,高效簡單,換言之就是不采用任何硬件加速方案,實現簡單高效,以適應不同硬件環境.
故基於英特爾這份代碼,俺對其進行了改寫以及優化.
最終在俺2.20GHz的CPU上,單核單線程,不采用流(SIMD)指令,達到了,處理一千六百萬像素的彩色照片僅需700毫秒左右.
按照慣例,還是貼個效果圖比較直觀.
之前也有網友問過這個算法的實現問題.
想了想,還是將代碼共享出來,供大家參考學習.
完整代碼:
void CalGaussianCoeff(float sigma, float * a0, float * a1, float * a2, float * a3, float * b1, float * b2, float * cprev, float * cnext) { float alpha, lamma, k; if (sigma < 0.5f) sigma = 0.5f; alpha = (float)exp((0.726) * (0.726)) / sigma; lamma = (float)exp(-alpha); *b2 = (float)exp(-2 * alpha); k = (1 - lamma) * (1 - lamma) / (1 + 2 * alpha * lamma - (*b2)); *a0 = k; *a1 = k * (alpha - 1) * lamma; *a2 = k * (alpha + 1) * lamma; *a3 = -k * (*b2); *b1 = -2 * lamma; *cprev = (*a0 + *a1) / (1 + *b1 + *b2); *cnext = (*a2 + *a3) / (1 + *b1 + *b2); } void gaussianHorizontal(unsigned char * bufferPerLine, unsigned char * lpRowInitial, unsigned char * lpColumn, int width, int height, int Channels, int Nwidth, float a0a1, float a2a3, float b1b2, float cprev, float cnext) { int HeightStep = Channels*height; int WidthSubOne = width - 1; if (Channels == 3) { float prevOut[3]; prevOut[0] = (lpRowInitial[0] * cprev); prevOut[1] = (lpRowInitial[1] * cprev); prevOut[2] = (lpRowInitial[2] * cprev); for (int x = 0; x < width; ++x) { prevOut[0] = ((lpRowInitial[0] * (a0a1)) - (prevOut[0] * (b1b2))); prevOut[1] = ((lpRowInitial[1] * (a0a1)) - (prevOut[1] * (b1b2))); prevOut[2] = ((lpRowInitial[2] * (a0a1)) - (prevOut[2] * (b1b2))); bufferPerLine[0] = prevOut[0]; bufferPerLine[1] = prevOut[1]; bufferPerLine[2] = prevOut[2]; bufferPerLine += Channels; lpRowInitial += Channels; } lpRowInitial -= Channels; lpColumn += HeightStep * WidthSubOne; bufferPerLine -= Channels; prevOut[0] = (lpRowInitial[0] * cnext); prevOut[1] = (lpRowInitial[1] * cnext); prevOut[2] = (lpRowInitial[2] * cnext); for (int x = WidthSubOne; x >= 0; --x) { prevOut[0] = ((lpRowInitial[0] * (a2a3)) - (prevOut[0] * (b1b2))); prevOut[1] = ((lpRowInitial[1] * (a2a3)) - (prevOut[1] * (b1b2))); prevOut[2] = ((lpRowInitial[2] * (a2a3)) - (prevOut[2] * (b1b2))); bufferPerLine[0] += prevOut[0]; bufferPerLine[1] += prevOut[1]; bufferPerLine[2] += prevOut[2]; lpColumn[0] = bufferPerLine[0]; lpColumn[1] = bufferPerLine[1]; lpColumn[2] = bufferPerLine[2]; lpRowInitial -= Channels; lpColumn -= HeightStep; bufferPerLine -= Channels; } } else if (Channels == 4) { float prevOut[4]; prevOut[0] = (lpRowInitial[0] * cprev); prevOut[1] = (lpRowInitial[1] * cprev); prevOut[2] = (lpRowInitial[2] * cprev); prevOut[3] = (lpRowInitial[3] * cprev); for (int x = 0; x < width; ++x) { prevOut[0] = ((lpRowInitial[0] * (a0a1)) - (prevOut[0] * (b1b2))); prevOut[1] = ((lpRowInitial[1] * (a0a1)) - (prevOut[1] * (b1b2))); prevOut[2] = ((lpRowInitial[2] * (a0a1)) - (prevOut[2] * (b1b2))); prevOut[3] = ((lpRowInitial[3] * (a0a1)) - (prevOut[3] * (b1b2))); bufferPerLine[0] = prevOut[0]; bufferPerLine[1] = prevOut[1]; bufferPerLine[2] = prevOut[2]; bufferPerLine[3] = prevOut[3]; bufferPerLine += Channels; lpRowInitial += Channels; } lpRowInitial -= Channels; lpColumn += HeightStep * WidthSubOne; bufferPerLine -= Channels; prevOut[0] = (lpRowInitial[0] * cnext); prevOut[1] = (lpRowInitial[1] * cnext); prevOut[2] = (lpRowInitial[2] * cnext); prevOut[3] = (lpRowInitial[3] * cnext); for (int x = WidthSubOne; x >= 0; --x) { prevOut[0] = ((lpRowInitial[0] * a2a3) - (prevOut[0] * b1b2)); prevOut[1] = ((lpRowInitial[1] * a2a3) - (prevOut[1] * b1b2)); prevOut[2] = ((lpRowInitial[2] * a2a3) - (prevOut[2] * b1b2)); prevOut[3] = ((lpRowInitial[3] * a2a3) - (prevOut[3] * b1b2)); bufferPerLine[0] += prevOut[0]; bufferPerLine[1] += prevOut[1]; bufferPerLine[2] += prevOut[2]; bufferPerLine[3] += prevOut[3]; lpColumn[0] = bufferPerLine[0]; lpColumn[1] = bufferPerLine[1]; lpColumn[2] = bufferPerLine[2]; lpColumn[3] = bufferPerLine[3]; lpRowInitial -= Channels; lpColumn -= HeightStep; bufferPerLine -= Channels; } } else if (Channels == 1) { float prevOut = (lpRowInitial[0] * cprev); for (int x = 0; x < width; ++x) { prevOut = ((lpRowInitial[0] * (a0a1)) - (prevOut * (b1b2))); bufferPerLine[0] = prevOut; bufferPerLine += Channels; lpRowInitial += Channels; } lpRowInitial -= Channels; lpColumn += HeightStep*WidthSubOne; bufferPerLine -= Channels; prevOut = (lpRowInitial[0] * cnext); for (int x = WidthSubOne; x >= 0; --x) { prevOut = ((lpRowInitial[0] * a2a3) - (prevOut * b1b2)); bufferPerLine[0] += prevOut; lpColumn[0] = bufferPerLine[0]; lpRowInitial -= Channels; lpColumn -= HeightStep; bufferPerLine -= Channels; } } } void gaussianVertical(unsigned char * bufferPerLine, unsigned char * lpRowInitial, unsigned char * lpColInitial, int height, int width, int Channels, float a0a1, float a2a3, float b1b2, float cprev, float cnext) { int WidthStep = Channels*width; int HeightSubOne = height - 1; if (Channels == 3) { float prevOut[3]; prevOut[0] = (lpRowInitial[0] * cprev); prevOut[1] = (lpRowInitial[1] * cprev); prevOut[2] = (lpRowInitial[2] * cprev); for (int y = 0; y < height; y++) { prevOut[0] = ((lpRowInitial[0] * a0a1) - (prevOut[0] * b1b2)); prevOut[1] = ((lpRowInitial[1] * a0a1) - (prevOut[1] * b1b2)); prevOut[2] = ((lpRowInitial[2] * a0a1) - (prevOut[2] * b1b2)); bufferPerLine[0] = prevOut[0]; bufferPerLine[1] = prevOut[1]; bufferPerLine[2] = prevOut[2]; bufferPerLine += Channels; lpRowInitial += Channels; } lpRowInitial -= Channels; bufferPerLine -= Channels; lpColInitial += WidthStep * HeightSubOne; prevOut[0] = (lpRowInitial[0] * cnext); prevOut[1] = (lpRowInitial[1] * cnext); prevOut[2] = (lpRowInitial[2] * cnext); for (int y = HeightSubOne; y >= 0; y--) { prevOut[0] = ((lpRowInitial[0] * a2a3) - (prevOut[0] * b1b2)); prevOut[1] = ((lpRowInitial[1] * a2a3) - (prevOut[1] * b1b2)); prevOut[2] = ((lpRowInitial[2] * a2a3) - (prevOut[2] * b1b2)); bufferPerLine[0] += prevOut[0]; bufferPerLine[1] += prevOut[1]; bufferPerLine[2] += prevOut[2]; lpColInitial[0] = bufferPerLine[0]; lpColInitial[1] = bufferPerLine[1]; lpColInitial[2] = bufferPerLine[2]; lpRowInitial -= Channels; lpColInitial -= WidthStep; bufferPerLine -= Channels; } } else if (Channels == 4) { float prevOut[4]; prevOut[0] = (lpRowInitial[0] * cprev); prevOut[1] = (lpRowInitial[1] * cprev); prevOut[2] = (lpRowInitial[2] * cprev); prevOut[3] = (lpRowInitial[3] * cprev); for (int y = 0; y < height; y++) { prevOut[0] = ((lpRowInitial[0] * a0a1) - (prevOut[0] * b1b2)); prevOut[1] = ((lpRowInitial[1] * a0a1) - (prevOut[1] * b1b2)); prevOut[2] = ((lpRowInitial[2] * a0a1) - (prevOut[2] * b1b2)); prevOut[3] = ((lpRowInitial[3] * a0a1) - (prevOut[3] * b1b2)); bufferPerLine[0] = prevOut[0]; bufferPerLine[1] = prevOut[1]; bufferPerLine[2] = prevOut[2]; bufferPerLine[3] = prevOut[3]; bufferPerLine += Channels; lpRowInitial += Channels; } lpRowInitial -= Channels; bufferPerLine -= Channels; lpColInitial += WidthStep*HeightSubOne; prevOut[0] = (lpRowInitial[0] * cnext); prevOut[1] = (lpRowInitial[1] * cnext); prevOut[2] = (lpRowInitial[2] * cnext); prevOut[3] = (lpRowInitial[3] * cnext); for (int y = HeightSubOne; y >= 0; y--) { prevOut[0] = ((lpRowInitial[0] * a2a3) - (prevOut[0] * b1b2)); prevOut[1] = ((lpRowInitial[1] * a2a3) - (prevOut[1] * b1b2)); prevOut[2] = ((lpRowInitial[2] * a2a3) - (prevOut[2] * b1b2)); prevOut[3] = ((lpRowInitial[3] * a2a3) - (prevOut[3] * b1b2)); bufferPerLine[0] += prevOut[0]; bufferPerLine[1] += prevOut[1]; bufferPerLine[2] += prevOut[2]; bufferPerLine[3] += prevOut[3]; lpColInitial[0] = bufferPerLine[0]; lpColInitial[1] = bufferPerLine[1]; lpColInitial[2] = bufferPerLine[2]; lpColInitial[3] = bufferPerLine[3]; lpRowInitial -= Channels; lpColInitial -= WidthStep; bufferPerLine -= Channels; } } else if (Channels == 1) { float prevOut = 0; prevOut = (lpRowInitial[0] * cprev); for (int y = 0; y < height; y++) { prevOut = ((lpRowInitial[0] * a0a1) - (prevOut * b1b2)); bufferPerLine[0] = prevOut; bufferPerLine += Channels; lpRowInitial += Channels; } lpRowInitial -= Channels; bufferPerLine -= Channels; lpColInitial += WidthStep*HeightSubOne; prevOut = (lpRowInitial[0] * cnext); for (int y = HeightSubOne; y >= 0; y--) { prevOut = ((lpRowInitial[0] * a2a3) - (prevOut * b1b2)); bufferPerLine[0] += prevOut; lpColInitial[0] = bufferPerLine[0]; lpRowInitial -= Channels; lpColInitial -= WidthStep; bufferPerLine -= Channels; } } } //本人博客:http://tntmonks.cnblogs.com/ 轉載請注明出處. void GaussianBlurFilter(unsigned char * input, unsigned char * output, int Width, int Height, int Stride, float GaussianSigma) { int Channels = Stride / Width; float a0, a1, a2, a3, b1, b2, cprev, cnext; CalGaussianCoeff(GaussianSigma, &a0, &a1, &a2, &a3, &b1, &b2, &cprev, &cnext); float a0a1 = (a0 + a1); float a2a3 = (a2 + a3); float b1b2 = (b1 + b2); int bufferSizePerThread = (Width > Height ? Width : Height) * Channels; unsigned char * bufferPerLine = (unsigned char*)malloc(bufferSizePerThread); unsigned char * tempData = (unsigned char*)malloc(Height * Stride); if (bufferPerLine == NULL || tempData == NULL) { if (tempData) { free(tempData); } if (bufferPerLine) { free(bufferPerLine); } return; } for (int y = 0; y < Height; ++y) { unsigned char * lpRowInitial = input + Stride * y; unsigned char * lpColInitial = tempData + y * Channels; gaussianHorizontal(bufferPerLine, lpRowInitial, lpColInitial, Width, Height, Channels, Width, a0a1, a2a3, b1b2, cprev, cnext); } int HeightStep = Height*Channels; for (int x = 0; x < Width; ++x) { unsigned char * lpColInitial = output + x*Channels; unsigned char * lpRowInitial = tempData + HeightStep * x; gaussianVertical(bufferPerLine, lpRowInitial, lpColInitial, Height, Width, Channels, a0a1, a2a3, b1b2, cprev, cnext); } free(bufferPerLine); free(tempData); }
調用方法:
GaussianBlurFilter(輸入圖像數據,輸出圖像數據,寬度,高度,通道數,強度)
注:支持通道數分別為 1 ,3 ,4.
關於IIR相關知識,參閱 百度詞條 "IIR數字濾波器"
http://baike.baidu.com/view/3088994.htm
天下武功,唯快不破。
本文只是拋磚引玉一下,若有其他相關問題或者需求也可以郵件聯系俺探討。
郵箱地址是:
gaozhihan@vip.qq.com
題外話:
很多網友一直推崇使用opencv,opencv的確十分強大,但是若是想要有更大的發展空間以及創造力.
還是要一步一個腳印去實現一些最基本的算法,扎實的基礎才是構建上層建築的基本條件.
俺目前只是把opencv當資料庫來看,並不認為opencv可以用於絕大多數的商業項目.
若本文幫到您,厚顏無恥求微信掃碼打個賞.