超分損失函數小結


原論文:Deep Learning for Image Super-resolution: A Survey

1.Pixel Loss:用來度量生成圖片和目標圖片的像素級的差異

1.1 L1 loss

 1.2 L2 loss

 

1.3 Charbonnier loss:L1 Loss的變體,最后參數是一個很小常量(e.g., 1e − 3),為了使數值穩定

 

像素損失是最常見的損失,通常L2損失能夠對大的損失進行懲罰,但是在小的損失上無能為力,效果不如L1,像素損失實際上並沒有考慮到圖像質量(如感知質量,紋理),經常缺乏高頻細節,並且產生的紋理過於平滑,難以令人滿意

2.Content Loss:如果一個網絡,生成的圖像足夠逼真,那么生成圖片的特征(度量特征提取網絡中提取的)也應該跟真實圖片的足夠像,因此通過使特征足夠相似,對生成圖片質量也有促進作用

 

 l是網絡第l層,常用的度量特征提取網絡有vgg,resnet。

3.Texture Loss:由於重建后的圖像應該與目標圖像具有相同的樣式(例如,顏色、紋理、對比度),將圖像的紋理視為不同特征通道之間的相關性(用矩陣點乘來表示相關性)

 

最終損失函數是要求相關性相同:

 

 

好用是好用,但是需要通過經驗(調參)來確定patch的大小,patch太小會造成紋理部分 artefacts(重影),太大會造成整個圖片重影。(因為紋理統計是對不同紋理區域求平均值)

4.Adversarial Loss:這就不用多說了,不服就GAN嘛

4.1 loss based on cross entropy

 

 

 4.2 loss based on least square error

 

 4.3 hinge-format adversarial loss

 

 像素級的判別器會使生成器產生高頻噪音,但是特征級的判別器可以很好的捕捉高清圖片的潛在屬性

其中比較重要的工作有:

“Learning to super-resolve blurry face and text images”合並一個多類GAN,包括單個生成器和特定於類的鑒別器

ESRGAN[101]利用 relativistic GAN[131]來預測真實圖像比假圖像相對真實的概率,而不是預測輸入圖像真實或生成的概率。

雖然經過GAN處理后的圖片PSNR會低一點(相比pixel loss)但是在感知質量上帶來了顯著的提高。鑒別器提取了真實圖像中一些難以獲得的潛在特征,並推動生成的HR圖像符合這些模式,從而有助於生成更真實的HR圖像

GAN相比其他模型,訓練上會比較困難,仍是目前未解決的一個問題

5.Cycle Consistency Loss:

 

 

 受CycleGAN啟發,將HR圖像通過另一個CNN網絡縮小成I‘,然后跟要處理的小圖片做相似性度量

6.Total Variation Loss:

 

 兩點作用:1.抑制噪音(主要是噪點),2.提升圖像的空間平滑性

7.Prior-Based Loss:基於先驗的損失

  Super-fan: Integrated facial landmark localization and super-resolution of real-world low resolution faces in arbitrary poses with gans 

  聚焦於人臉圖像的SR,並引入了人臉比對網絡(FAN)來約束從原始和生成的圖像中檢測到的人臉地標的一致性

 

實踐中經常是多個損失函數組合使用,但是權重調節是一個大問題,對結果有決定性的影響,需要各位煉丹師自己去摸索了

2019年11月21日更新

圖片超分的方法對視頻超分不是很合適,因此又推出了新的度量標准

Video Multimethod Assessment Fusion (VMAF)

https://medium.com/netflix-techblog/toward-a-practical-perceptual-video-quality-metric-653f208b9652

The current version of the VMAF algorithm and model (denoted as VMAF 0.3.1), released as part of the VMAF Development Kit open source software, uses the following elementary metrics fused by Support Vector Machine (SVM) regression [8]:

  • Visual Information Fidelity (VIF) [9]. VIF is a well-adopted image quality metric based on the premise that quality is complementary to the measure of information fidelity loss. In its original form, the VIF score is measured as a loss of fidelity combining four scales. In VMAF, we adopt a modified version of VIF where the loss of fidelity in each scale is included as an elementary metric.
  • Detail Loss Metric (DLM) [10]. DLM is an image quality metric based on the rationale of separately measuring the loss of details which affects the content visibility, and the redundant impairment which distracts viewer attention. The original metric combines both DLM and additive impairment measure (AIM) to yield a final score. In VMAF, we only adopt the DLM as an elementary metric. Particular care was taken for special cases, such as black frames, where numerical calculations for the original formulation break down.

VIF and DLM are both image quality metrics. We further introduce the following simple feature to account for the temporal characteristics of video:

  • Motion. This is a simple measure of the temporal difference between adjacent frames. This is accomplished by calculating the average absolute pixel difference for the luminance component.

 

9.H. Sheikh and A. Bovik, “Image Information and Visual Quality,” IEEE Transactions on Image Processing, vol. 15, no. 2, pp. 430–444, Feb. 2006.

10.S. Li, F. Zhang, L. Ma, and K. Ngan, “Image Quality Assessment by Separately Evaluating Detail Losses and Additive Impairments,” IEEE Transactions on Multimedia, vol. 13, no. 5, pp. 935–949, Oct. 2011

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM