四種比較簡單的圖像顯著性區域特征提取方法原理及實現-----> AC/HC/LC/FT。

本文轉載自查看原文 2014-08-04 08:59 19906 salient region detection/ [04] 圖像識別相關/ 顯著性檢測

laviewpbt 2014.8.4 編輯

Email：laviewpbt@sina.com QQ：33184777

　　最近閑來蛋痛，看了一些顯著性檢測的文章，只是簡單的看看，並沒有深入的研究，以下將研究的一些收獲和經驗共享。

先從最簡單的最容易實現的算法說起吧：

　　1、 LC算法

　　參考論文：Visual Attention Detection in Video Sequences Using Spatiotemporal Cues。 Yun Zhai and Mubarak Shah. Page 4-5。

算法原理部分見論文的第四第五頁。

When viewers watch a video sequence, they are attracted not only by the interesting events, but also sometimes by the interesting objects in still images. This is referred as the spatial attention. Based on the psychological studies, human perception system is sensitive to the contrast of visual signals, such as color, intensity and texture. Taking this as the underlying assumption, we propose an e±cient method for computing the spatial saliency maps using the color statistics of images. The algorithm is designed with a linear computational complexity with respect to the number of image pixels. The saliency map of an image is built upon the color contrast between image pixels. The saliency value of a pixel I_k in an image I is defined as,

where the value of I_i is in the range of [0; 255], and || * ||represent the color distance metric。

要實現這個算法，只要有這個公式(7)就完全足夠了。就是每個像素的顯著性值是其和圖像中其他的所有像素的某個距離的總和，這個距離一般使用歐式距離。

如果采用直接的公式定義，則算法的時間復雜度很高，這個的優化不用想就知道是直方圖，我都懶得說了。

注意這篇文章采用的一個像素的灰度值來作為顯著性計算的依據。這樣圖像最多的像素值只有256種了。

該算法的代碼在HC對應的文章的附帶代碼里有，我這里貼出我自己的實現：

extern void Normalize(float *DistMap, unsigned char *SaliencyMap, int Width, int Height, int Stride, int Method = 0);

/// <summary>
/// 實現功能： 基於SPATIAL ATTENTION MODEL的圖像顯著性檢測
///    參考論文： Visual Attention Detection in Video Sequences Using Spatiotemporal Cues。 Yun Zhai and Mubarak Shah.  Page 4-5。
///    整理時間： 2014.8.2
/// </summary>
/// <param name="Src">需要進行檢測的圖像數據，只支持24位圖像。</param>
/// <param name="SaliencyMap">輸出的顯著性圖像，也是24位的。</param>
/// <param name="Width">輸入的彩色數據的對應的灰度數據。</param>
/// <param name="Height">輸入圖像數據的高度。</param>
/// <param name="Stride">圖像的掃描行大小。</param>
/// <remarks> 基於像素灰度值進行的統計。</remarks>

void __stdcall SalientRegionDetectionBasedonLC(unsigned char *Src, unsigned char *SaliencyMap, int Width, int Height, int Stride)
{
    int X, Y, Index, CurIndex ,Value;
    unsigned char *Gray = (unsigned char*)malloc(Width * Height);
    int *Dist = (int *)malloc(256 * sizeof(int));
    int *HistGram = (int *)malloc(256 * sizeof(int));
    float *DistMap = (float *) malloc(Height * Width * sizeof(float));

    memset(HistGram, 0, 256 * sizeof(int));

    for (Y = 0; Y < Height; Y++)
    {
        Index = Y * Stride;
        CurIndex = Y * Width;
        for (X = 0; X < Width; X++)
        {
            Value = (Src[Index] + Src[Index + 1] * 2 + Src[Index + 2]) / 4;        //    保留灰度值，以便不需要重復計算
            HistGram[Value] ++;
            Gray[CurIndex] = Value;
            Index += 3;
            CurIndex ++;
        }
    }

    for (Y = 0; Y < 256; Y++)
    {
        Value = 0;
        for (X = 0; X < 256; X++) 
            Value += abs(Y - X) * HistGram[X];                //    論文公式（9），灰度的距離只有絕對值，這里其實可以優化速度，但計算量不大，沒必要了
        Dist[Y] = Value;
    }
    
    for (Y = 0; Y < Height; Y++)
    {
        CurIndex = Y * Width;
        for (X = 0; X < Width; X++)
        {
            DistMap[CurIndex] = Dist[Gray[CurIndex]];        //    計算全圖每個像素的顯著性
            CurIndex ++;
        }
    }

    Normalize(DistMap, SaliencyMap, Width, Height, Stride);    //    歸一化圖像數據

    free(Gray);
    free(Dist);
    free(HistGram);
    free(DistMap);
}

　　算法效果：

　　這篇論文並沒有提到是否在LAB空間進行處理，有興趣的朋友也可以試試LAB的效果。

　　2、HC算法

　　參考論文： 2011 CVPR Global Contrast based salient region detection Ming-Ming Cheng

　　這篇論文有相關代碼可以直接下載的，不過需要向作者索取解壓密碼，有pudn賬號的朋友可以直接在pudn上下載，不過那個下載的代碼是用 opencv的低版本寫的，下載后需要自己配置后才能運行，並且似乎只有前一半能運行（顯著性檢測部分）。

論文提出了HC和RC兩種顯著性檢測的算法，我這里只實現了HC。

在本質上，HC和上面的LC沒有區別，但是HC考慮了彩色信息，而不是像LC那樣只用像素的灰度信息，由於彩色圖像最多有256*256*256種顏色，因此直接基於直方圖技術的方案不太可行了。但是實際上一幅彩色圖像並不會用到那么多種顏色，因此，作者提出了降低顏色數量的方案，將RGB各分量分別映射成12等份，則隱射后的圖最多只有12*12*12種顏色，這樣就可以構造一個較小的直方圖用來加速，但是由於過渡量化會對結果帶來一定的瑕疵。因此作者又用了一個平滑的過程。最后和LC不同的是，作者的處理時在Lab空間進行的，而由於Lab空間和RGB並不是完全對應的，其量化過程還是在RGB空間完成的。

我們簡單看看這個量化過程，對於一幅彩色圖像，減少其RGB各分量的值，可以用Photoshop的色調分離功能直接看到其結果，如下所示：

　　　　原圖：共有64330種顏色　　　　　　　　　　　　　　　　　　　　色調分離　　　　　　　　　　　　　　　　　　結果圖：共有1143種顏色

　　（上圖由於保存為JPG格式了，你們下載分析后實際顏色的數量肯定會有所不同了）。

　　對於上面的圖，似乎覺得量化后區別不是特別大，但是我們在看一個例子：

　　　　　　　　　　　　　　原圖：172373種顏色　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　結果圖：共有1143種顏色

這種轉換后的區別就比較大了，這就是作者說的瑕疵。

在作者的附帶代碼中，有這個算法的實現，我只隨便看了下，覺得寫的比較復雜，於是我自己構思了自己的想法。

可以肯定的一點就是，為了加快處理速度必須降低圖像的彩色信息量，但是我得控制這個降低的程度，那么我想到了我最那首的一些東西：圖像的位深處理。在我的Imageshop中，可以將24位真彩色圖像用盡量少的視覺損失降低為8位的索引圖像。因此，我的思路就是這樣，但是不用降低位深而已。

那么這個處理的第一步就是找到彩色圖像的中最具有代表性的顏色值，這個過程可以用8叉樹實現，或者用高4位等方式獲取。第二，就是在量化的過程中必須采用相關的抖動技術，比如ordered dither或者FloydSteinberg error diffuse等。更進一步，可以超越8位索引的概念，可以實現諸如大於256的調色板，1024或者4096都是可以的，但是這將稍微加大計算量以及編碼的復雜度。我就采用256種顏色的方式。量化的結果如下圖：

　　　　　　　　原圖：172373種顏色　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　結果圖：共有256種顏色

　　可以看到256種顏色的效果比上面的色調分離的1143種顏色的視覺效果還要好很多的。

從速度角度考慮，用8叉樹得到調色板是個比較耗時的過程，一種處理方式就是從原圖的小圖中獲取，一半來說256*256大小的小圖獲取的調色板和原圖相比基本沒有啥區別，不過這個獲取小圖的插值方式最好是使用最近鄰插值：第一：速度快；第二：不會產生新的顏色。

最后，畢竟處理時還是有視覺損失和瑕疵，在我的算法最后也是對顯著性圖進行了半徑為1左右的高斯模糊的。

貼出部分代碼：

/// <summary>
/// 實現功能： 基於全局對比度的圖像顯著性檢測
///    參考論文： 2011 CVPR Global Contrast based salient region detection  Ming-Ming Cheng
///               http://mmcheng.net/salobj/
///    整理時間： 2014.8.3
/// </summary>
/// <param name="Src">需要進行檢測的圖像數據，只支持24位圖像。</param>
/// <param name="SaliencyMap">輸出的顯著性圖像，也是24位的。</param>
/// <param name="Width">輸入的彩色數據的對應的灰度數據。</param>
/// <param name="Height">輸入圖像數據的高度。</param>
/// <param name="Stride">圖像的掃描行大小。</param>
///    <remarks> 在Lab空間進行的處理，使用了整形的LAB轉換，采用抖動技術將圖像顏色總數量降低為256種，在利用直方圖計算出顯著性查找表，最后采用高斯模糊降低量化后的顆粒感。</remarks>

void __stdcall SalientRegionDetectionBasedonHC(unsigned char *Src, unsigned char *SaliencyMap, int Width, int Height, int Stride)
{
    int X, Y, XX, YY, Index, Fast, CurIndex;
    int FitX, FitY, FitWidth, FitHeight;
    float Value;
    unsigned char *Lab = (unsigned char *) malloc(Height * Stride);
    unsigned char *Mask = (unsigned char *) malloc(Height * Width);
    float *DistMap = (float *) malloc(Height * Width * sizeof(float));
    float *Dist = (float *)malloc(256 * sizeof(float));
    int *HistGram = (int *)malloc(256 * sizeof(int));

    GetBestFitInfoEx(Width, Height, 256, 256, FitX, FitY, FitWidth, FitHeight);
    unsigned char *Sample = (unsigned char *) malloc(FitWidth * FitHeight * 3);

    InitRGBLAB();
    for (Y = 0; Y < Height; Y++)
        RGBToLAB(Src + Y * Stride, Lab + Y * Stride, Width);

    Resample (Lab, Width, Height, Stride, Sample, FitWidth, FitHeight, FitWidth * 3, 0);    //    最近鄰插值

    RGBQUAD *Palette = ( RGBQUAD *)malloc( 256 * sizeof(RGBQUAD));
    
    GetOptimalPalette(Sample, FitWidth, FitHeight, FitWidth * 3, 256, Palette);

    ErrorDiffusionFloydSteinberg(Lab, Mask, Width, Height, Stride, Palette, true);            //    先把圖像信息量化到較少的范圍內，這里量化到256種彩色

    memset(HistGram, 0, 256 * sizeof(int));

    for (Y = 0; Y < Height; Y++)
    {
        CurIndex = Y * Width;
        for (X = 0; X < Width; X++)
        {
            HistGram[Mask[CurIndex]] ++;
            CurIndex ++;
        }
    }

    for (Y = 0; Y < 256; Y++)                                // 采用類似LC的方式進行顯著性計算
    {
        Value = 0;
        for (X = 0; X < 256; X++) 
            Value += sqrt((Palette[Y].rgbBlue - Palette[X].rgbBlue)*(Palette[Y].rgbBlue - Palette[X].rgbBlue) + (Palette[Y].rgbGreen- Palette[X].rgbGreen)*(Palette[Y].rgbGreen - Palette[X].rgbGreen) + (Palette[Y].rgbRed- Palette[X].rgbRed)*(Palette[Y].rgbRed - Palette[X].rgbRed)+ 0.0 )  * HistGram[X];
        Dist[Y] = Value;
    }

    for (Y = 0; Y < Height; Y++)
    {
        CurIndex = Y * Width;
        for (X = 0; X < Width; X++)
        {
            DistMap[CurIndex] = Dist[Mask[CurIndex]];
            CurIndex ++;
        }
    }

    Normalize(DistMap, SaliencyMap, Width, Height, Stride);                //    歸一化圖像數據

    GuassBlur(SaliencyMap, Width, Height, Stride, 1);                    //    最后做個模糊以消除分層的現象
    
    free(Dist);
    free(HistGram);
    free(Lab);
    free(Palette);
    free(Mask);
    free(DistMap);
    free(Sample);
    FreeRGBLAB();
}

　　上述方式比直接的Bruce-force的實現方式快了NNNN倍，比原作者的代碼也快一些。並且效果基本沒有啥區別。

　　　　　　　　　　原圖　　　　　　　　　　　　　　　　HC結果,用時20ms　　　　　　　　　　　　直接實現：150000ms　　　　　　　　　　　　　　原作者的效果

　　我做的HC和原作者的結果有所區別，我沒仔細看代碼，初步懷疑是不是LAB空間的處理不同造成的，也有可能是最后的浮點數量化到[0,255]算法不同造成的。

　　三：AC算法

參考論文：Salient Region Detection and Segmentation Radhakrishna Achanta, Francisco Estrada, Patricia Wils, and Sabine SÄusstrunk 2008 , Page 4-5

　這篇論文提出的算法的思想用其論文的一句話表達就是：

　 saliency is determined as the local contrast of an image region with respect to its neighborhood at various scales.

　具體實現上，用這個公式表示：

　　以及：

　　其實很簡單，就是用多個尺度的模糊圖的顯著性相加來獲得最終的顯著性。關於這個算法的理論分析，FT算法那個論文里有這樣一段話：

Objects that are smaller than a ﬁlter size are detected ompletely, while objects larger than a ﬁlter size are only artially detected (closer to edges). Smaller objects that are well detected by the smallest ﬁlter are detected by all three ﬁlters, while larger objects are only detected by the larger ﬁlters. Since the ﬁnal saliency map is an average of the three feature maps (corresponding to detections of he three ﬁlters), small objects will almost always be better highlighted.

這個算法編碼上也非常簡單：

/// <summary>
/// 實現功能： saliency is determined as the local contrast of an image region with respect to its neighborhood at various scales
/// 參考論文： Salient Region Detection and Segmentation   Radhakrishna Achanta, Francisco Estrada, Patricia Wils, and Sabine SÄusstrunk   2008  , Page 4-5
///    整理時間： 2014.8.2
/// </summary>
/// <param name="Src">需要進行檢測的圖像數據，只支持24位圖像。</param>
/// <param name="SaliencyMap">輸出的顯著性圖像，也是24位的。</param>
/// <param name="Width">輸入的彩色數據的對應的灰度數據。</param>
/// <param name="Height">輸入圖像數據的高度。</param>
/// <param name="Stride">圖像的掃描行大小。</param>
/// <param name="R1">inner region's radius R1。</param>
/// <param name="MinR2">outer regions's min radius。</param>
/// <param name="MaxR2">outer regions's max radius。</param>
/// <param name="Scale">outer regions's scales。</param>
///    <remarks> 通過不同尺度局部對比度疊加得到像素顯著性。</remarks>

void __stdcall SalientRegionDetectionBasedonAC(unsigned char *Src, unsigned char *SaliencyMap, int Width, int Height, int Stride, int R1, int MinR2, int MaxR2, int Scale)
{
    int X, Y, Z, Index, CurIndex;
    unsigned char *MeanR1 =(unsigned char *)malloc( Height * Stride);
    unsigned char *MeanR2 =(unsigned char *)malloc( Height * Stride);
    unsigned char *Lab = (unsigned char *) malloc(Height * Stride);
    float *DistMap = (float *)malloc(Height * Width * sizeof(float));

    InitRGBLAB();    
    for (Y = 0; Y < Height; Y++) 
        RGBToLAB(Src + Y * Stride, Lab + Y * Stride, Width);                    //    注意也是在Lab空間進行的

    memcpy(MeanR1, Lab, Height * Stride);
    if (R1 > 0)                                                                    //    如果R1==0，則表示就取原始像素
        BoxBlur(MeanR1, Width, Height, Stride, R1);

    memset(DistMap, 0, Height * Width * sizeof(float));

    for (Z = 0; Z < Scale; Z++)
    {
        memcpy(MeanR2, Lab, Height * Stride);
        BoxBlur(MeanR2, Width, Height, Stride, (MaxR2 - MinR2) * Z / (Scale - 1) + MinR2);
        for (Y = 0; Y < Height; Y++) 
        {
            Index = Y * Stride;
            CurIndex = Y * Width;
            for (X = 0; X < Width; X++)                    //    計算全圖每個像素的顯著性
            {
                DistMap[CurIndex] += sqrt( (MeanR2[Index] - MeanR1[Index]) * (MeanR2[Index] - MeanR1[Index]) + (MeanR2[Index + 1] - MeanR1[Index + 1]) * (MeanR2[Index + 1] - MeanR1[Index + 1]) + (MeanR2[Index + 2] - MeanR1[Index + 2]) * (MeanR2[Index + 2] - MeanR1[Index + 2]) + 0.0) ;
                CurIndex++;
                Index += 3;
            }
        }
    }
    
    Normalize(DistMap, SaliencyMap, Width, Height, Stride, 0);        //    歸一化圖像數據

    free(MeanR1);
    free(MeanR2);
    free(DistMap);
    free(Lab);
    FreeRGBLAB();
}

　　核心就是一個 boxblur,注意他也是在LAB空間做的處理。

　以上檢測均是在R1 =0 , MinR2 = Min(Width,Height) / 8 . MaxR2 = Min(Width,Height) / 2, Scale = 3的結果。

4、FT算法

　　參考論文： Frequency-tuned Salient Region Detection， Radhakrishna Achantay， Page 4-5, 2009 CVPR

　　這篇論文對顯著性檢測提出了以下5個指標：

1、 Emphasize the largest salient objects.

2、Uniformly highlight whole salient regions.

3、Establish well-deﬁned boundaries of salient objects.

4、Disregard high frequencies arising from texture, noise and blocking artifacts.

5、Efﬁciently output full resolution saliency maps.

而起最后提出的顯著性檢測的計算方式也很簡答：

where I is the mean image feature vector, I!hc (x; y) is the corresponding image pixel vector value in the Gaussian blurred version (using a 55 separable binomial kernel) of the original image, and || *|| is the L2 norm.

　　這個公式和上面的五點式如何對應的，論文里講的蠻清楚，我就是覺得那個為什么第一項要用平局值其實直觀的理解就是當高斯模糊的半徑為無限大時，就相當於求一幅圖像的平均值了。

這篇論文作者提供了M代碼和VC的代碼，但是M代碼實際上和VC的代碼是不是對應的, M代碼是有錯誤的,他求平均值的對象不對。

我試着用我優化的整形的LAB空間來實現這個代碼，結果和原作者的效果有些圖有較大的區別，最后我還是采用了作者的代碼里提供的浮點版本的RGBTOLAB。

相關參考代碼如下：

/// <summary>
/// 實現功能： 基於Frequency-tuned 的圖像顯著性檢測
///    參考論文： Frequency-tuned Salient Region Detection， Radhakrishna Achantay， Page 4-5, 2009 CVPR 
///               http://ivrgwww.epfl.ch/supplementary_material/RK_CVPR09/
///    整理時間： 2014.8.2
/// </summary>
/// <param name="Src">需要進行檢測的圖像數據，只支持24位圖像。</param>
/// <param name="SaliencyMap">輸出的顯著性圖像，也是24位的。</param>
/// <param name="Width">輸入的彩色數據的對應的灰度數據。</param>
/// <param name="Height">輸入圖像數據的高度。</param>
/// <param name="Stride">圖像的掃描行大小。</param>
///    <remarks> 在Lab空間進行的處理，但是不能用庫中的整形RGBLAB顏色函數，必須用原始的浮點數處理。不然很多結果不明顯，原因未知。</remarks>

void __stdcall SalientRegionDetectionBasedOnFT(unsigned char *Src, unsigned char *SaliencyMap, int Width, int Height, int Stride)
{
    int X, Y, XX, YY, Index, Fast, CurIndex, SrcB, SrcG, SrcR, DstB, DstG, DstR;
    float *Lab = (float *) malloc(Height * Stride * sizeof(float));
    float *DistMap = (float *) malloc(Height * Width * sizeof(float));
    float MeanL = 0, MeanA = 0, MeanB = 0;
    
    for (Y = 0; Y < Height; Y++) 
        RGBToLABF(Src + Y * Stride, Lab + Y * Stride, Width);                //    浮點類型的數據轉換
    
    for (Y = 0; Y < Height; Y++) 
    {
        Index = Y * Stride;
        for (X = 0; X < Width; X++)
        {
            MeanL +=  Lab[Index];
            MeanA +=  Lab[Index + 1];
            MeanB +=  Lab[Index + 2];
            Index += 3;
        }
    }
    MeanL /= (Width * Height);                                            //    求LAB空間的平均值
    MeanA /= (Width * Height);
    MeanB /= (Width * Height);

    GuassBlurF(Lab, Width, Height, Stride, 1);                            //    use Gaussian blur to eliminate ﬁne texture details as well as noise and coding artifacts

    for (Y = 0; Y < Height; Y++)                                        //    網站的matlab代碼的blur部分代碼不對
    {
        Index = Y * Stride;
        CurIndex = Y * Width;
        for (X = 0; X < Width; X++)                                        //    計算像素的顯著性
        {
            DistMap[CurIndex++] = (MeanL - Lab[Index]) *  (MeanL - Lab[Index]) +  (MeanA - Lab[Index + 1]) *  (MeanA - Lab[Index + 1]) +  (MeanB - Lab[Index + 2]) *  (MeanB - Lab[Index + 2])   ;
            Index += 3;
        }
    }
    
    Normalize(DistMap, SaliencyMap, Width, Height, Stride);                //    歸一化圖像數據

    free(Lab);
    free(DistMap);

}