FFDNet: Toward a Fast and Flexible Solution for CNN based Image Denoising
發表在2018 TIP。
摘要:
Due to the fast inference and good performance, discriminative learning methods have been widely studied in image denoising. However, these methods mostly learn a specific model for each noise level, and require multiple models for denoising images with different noise levels. They also lack flexibility to deal with spatially variant noise, limiting their applications in practical denoising. To address these issues, we present a fast and flexible denoising convolutional neural network, namely FFDNet, with a tunable noise level map as the input. The proposed FFDNet works on downsampled subimages, achieving a good trade-off between inference speed and denoising performance. In contrast to the existing discriminative denoisers, FFDNet enjoys several desirable properties, including (i) the ability to handle a wide range of noise levels (i.e., [0, 75]) effectively with a single network, (ii) the ability to remove spatially variant noise by specifying a non-uniform noise level map, and (iii) faster speed than benchmark BM3D even on CPU without sacrificing denoising performance. Extensive experiments on synthetic and real noisy images are conducted to evaluate FFDNet in comparison with state-of-the-art denoisers. The results show that FFDNet is effective and efficient, making it highly attractive for practical denoising applications.
結論:
In this paper, we proposed a new CNN model, namely FFDNet, for fast, effective and flexible discriminative denoising. To achieve this goal, several techniques were utilized in network design and training, such as the use of noise level map as input and denoising in downsampled sub-images space. The results on synthetic images with AWGN demonstrated that FFDNet can not only produce state-of-the-art results when input noise level matches ground-truth noise level, but also have the ability to robustly control the trade-off between noise reduction and detail preservation. The results on images with spatially variant AWGN validated the flexibility of FFDNet for handing inhomogeneous noise. The results on real noisy images further demonstrated that FFDNet can deliver perceptually appealing denoising results. Finally, the running time comparisons showed the faster speed of FFDNet over other competing methods such as BM3D. Considering its flexibility, efficiency and effectiveness, FFDNet provides a practical solution to CNN denoising applications.
要點:
- 將噪聲方差圖作為CNN的輸入,可以讓網絡更健壯,適應不同程度的噪聲輸入。
- 在降采樣的子圖像上操作,計算量更低。
亮點:
-
作者提供了一個insight:可以嘗試將噪聲參數和其他網絡參數剝離開(獨立),使得單一網絡可以用於多種噪聲尺度。
-
具體來說,作者將噪聲標准差圖作為額外的輸入,輸入去噪網絡。並且作者通過正交初始化,嘗試減小濾波器之間的相關性。
-
套用了SPMC中的思想:處理降采樣的子圖像而不是原始尺寸的圖像,節省計算,並且提升了感受野面積。
局限:
-
根本無法做到盲去噪!居然是肉眼觀察,選擇最佳!本質問題:在實際應用時,你根本不知道噪聲程度是多少(甚至可能不是高斯噪聲),因此只能猜測和組合處理。
The non-blind FFDNet model can be viewed as multiple denoisers, each of which is anchored with a noise level. Accordingly, it has the ability to control the trade-off between noise removal and detail preservation which in turn facilitates the removal of real noise to some extent
作者辯解:由於實際噪聲模型不是AWGN,因此與其采用不精確的噪聲水平預測器,不如直接采用一系列(不同噪聲水平的)FFDNet,得到一系列結果,取最好的結果。在實驗部分他們才說清楚(這一點非常可惡,在摘要把盲去噪誘惑人,實驗里卻說盲去噪不是重點):
Instead of adopting any noise level estimation methods, we adopt an interactive strategy to handle real noisy images. First of all, we empirically found that the assumption of spatially invariant noise usually works well for most real noisy images. We then employ a set of typical input noise levels to produce multiple outputs, and select the one which has best trade-off between noise reduction and detail preservation.
還有第十頁:
The noise levels at other regions are then interpolated from the noise levels of the typical regions to constitute an approximated non-uniform noise level map. Our FFDNet focuses on non-blind denoising and assumes the noise level map is known. In practice, some advanced noise level estimation methods [62], [64] can be adopted to assist the estimation of noise level map. In our following experiments, unless otherwise specified, we assume spatially invariant noise for the real noisy images.
-
理想狀況下,模型的參數應該與噪聲程度獨立,從而實現可調節處理。但這一點很難做到。
-
子圖像的獲取方法很粗糙(簡單的reshape函數),還原為完整圖像的方法更粗糙,效果不敢苟同。
故事背景
作者給出了幾點去噪任務的意義:
- 噪聲在圖像成像階段,以及一些計算機視覺任務中是難以避免的,如[1,2]。
- 從貝葉斯觀點出發,去噪是檢驗圖像先驗模型和優化方法的任務,如[3-5]。
- 圖像去噪任務可以作為其他圖像恢復任務中的模塊,如[6-9]。
歷史工作的共同局限性:通常會給定噪聲的形式(如AWGN)和噪聲程度。
核心思想
CNN是一個典型的靜態結構。相比於傳統優化方法,這種結構是比較死板的:一旦訓練集的噪聲程度給定,那么模型就只適用於這一噪聲水平。
換句話說,我們學習的是映射\(f(y, \theta)\),其中\(\theta\)是噪聲水平。我們可以將\(\theta\)單獨拎出來,作為獨立於訓練集的參數,方便人為調整。理想狀態下,我們訓練的模型應該與\(\theta\)無關。文中是這么闡釋的:
In the DnCNN model \(x = F(y; \theta_σ)\), the parameters \(\theta_σ\) vary with the change of noise level \(σ\), while in the FFDNet model, the noise level map is modeled as an input and the model parameters \(\theta\) are invariant to noise level. Thus, FFDNet provides a flexible way to handle different noise levels with a single network.
具體而言,本文引入了一個新的CNN輸入:噪聲圖(noise level map)\(M\)。
FFDNet
如圖:
- 輸入有噪圖像被reshape至四張子圖像。
- 四張子圖像和噪聲水平圖一起,輸入CNN網絡。
- 得到四張去噪的子圖像,再拼接得到最終輸出圖像。
網絡設置
-
卷積層都是\(3 \times 3\),結構與DnCNN相似。不同的是,這里不采用短連接。
-
對於灰度圖像,網絡層數設置為15,每一層有64個通道;對於彩色圖像則為12和96。原因:作者認為,RGB圖像的三通道之間是有關聯的,使用更淺的層,有利於挖掘其內部相關性;此外,彩色通道的輸入更大,因此計算量也會更大;最重要的是,實驗發現寬度比深度對彩圖更重要。
噪聲水平圖
第四頁在講道理,刷公式。具體做法就一句話:對於確定的、標准差為\(\sigma\)的AWGN噪聲,\(M\)的每一個元素都是\(\sigma\)。
有考慮非均勻的\(M\)嗎?有,后面看。
對子圖像的去噪
現在有兩個策略,可以很快地降低計算量,但有缺點:
- 淺層網絡。顯然不行。
- 空洞卷積。作者發現會導致塊效應,特別是在銳利邊緣附近。
實際上,對子圖像的處理借鑒了[39]中SPMC層用於超分辨的思路。這里的子圖像是輸入圖像的\(\frac{1}{4}\)大。
對子圖像處理,還可以提升感受野。
保證噪聲水平圖的有效性
前面也提到,作者希望噪聲方差圖能獨立於模型參數。因此,強迫這種獨立性就顯得尤為重要。
正交正則化(orthogonal regularization)是一種消除濾波器相關性的方法。在本文中,作者采用的是正交初始化。
如何盲處理
作者辯稱:我們可以將多個FFDNet(不同噪聲下訓練)用於處理未知程度的噪聲,而不像DnCNN一樣混合訓練(作者說那樣效果不好)。
為啥不用短連接
一句話:近期的一些工作[44,49]證實,當網絡比較深時,RL意義不大。因此為了簡單,作者這里也沒用RL。但作者采用了Adam,BN和ReLU。
裁剪像素范圍
我們知道,8bits數字圖像應該在0到255之間取整。但有些工作沒有這么做。本文也沒有。
實驗
對於時空不變噪聲,我們用加性噪聲AWGN建模;對於時空變化的噪聲,我們用時空不變噪聲AWGN與圖像像素的點乘建模,見C。
一般性的實驗略。
關於噪聲水平圖的敏感性
這里做了一個實驗。例如FFDNet-20,即我們告訴FFDNet網絡(輸入噪聲水平圖的)噪聲標准差為20。但輸入圖像的真實噪聲標准差從0到50變化。有三個發現:
-
當輸入圖像的噪聲標准差,等於噪聲圖的標准差時(例如都是20),DnCNN、BM3D和FFDNet的效果近似。
-
並且,此時效果是最好的。
-
當真實標准差小於輸入標准差時,對性能沒有什么影響。但反之,效果會迅速變差。這告訴我們:輸入噪聲圖的噪聲標准差可以激進(估高),但不要保守(估低)。
盲處理
按照標准差間隔為5,測試得到多個輸出。其余標准差下的輸出通過插值得到。肉眼挑出最好的???!!!作者還一本正經地強調原因:
Instead of adopting any noise level estimation methods, we adopt an interactive strategy to handle real noisy images. First of all, we empirically found that the assumption of spatially invariant noise usually works well for most real noisy images. We then employ a set of typical input noise levels to produce multiple outputs, and select the one which has best trade-off between noise reduction and detail preservation. Second, the spatially variant noise in most real-world images is signal-dependent. In this case, we first sample several typical regions of distinct colors. For each typical region, we apply different noise levels with an interval of 5, and choose the best noise level by observing the denoising results. The noise levels at other regions are then interpolated from the noise levels of the typical regions to constitute an approximated non-uniform noise level map. Our FFDNet focuses on non-blind denoising and assumes the noise level map is known. In practice, some advanced noise level estimation methods [62], [64] can be adopted to assist the estimation of noise level map. In our following experiments, unless otherwise specified, we assume spatially invariant noise for the real noisy images.