
Abstract
Anomaly detection refers to the task of finding unusual instances that stand out from the normal data. In several applications, these outliers or anomalous instances are of greater interest compared to the normal ones. Specifically in the case of industrial optical inspection and infrastructure asset management, finding these defects (anomalous regions) is of extreme importance. Traditionally and even today this process has been carried out manually. Humans rely on the saliency of the defects in comparison to the normal texture to detect the defects. However, manual inspection is slow, tedious, subjective and susceptible to human biases. Therefore, the automation of defect detection is desirable. But for defect detection lack of availability of a large number of anomalous instances and labelled data is a problem. In this paper, we present a convolutional auto-encoder architecture for anomaly detection that is trained only on the defect-free (normal) instances. For the test images, residual masks that are obtained by subtracting the original image from the auto-encoder output are thresholded to obtain the defect segmentation masks. The approach was tested on two data-sets and achieved an impressive average F1 score of 0.885. The network learnt to detect the actual shape of the defects even though no defected images were used during the training.
摘要
異常檢測是指從正常樣本中發現異常實例的任務。在一些應用中,與正常值做對比,這些異常或者反常實例更加有意義。特別的,如工業光學檢測的例子和基礎設備資產管理,發現這些缺陷(異常區域)是非常重要的。傳統的乃至今天這些流程還是用人工進行操作。人們依靠相比於正常細節的缺陷顯著性來檢測缺陷。然而,人工檢測是緩慢的,冗長的,主觀的,易受人類的偏見影響。因此,自動化的缺陷檢測是令人向往的。但是在缺陷檢測中, 缺乏大量的缺陷實例和標簽數據是一個問題。在這篇文章中,我們提出一種卷積自動化編譯結果對於缺陷檢測,這個可以只使用正常樣本進行訓練。對於測試圖片中,通過將原始的圖片減去生成后的圖片生成殘差掩膜,通過閾值后,生成檢測分割任務。這個方法在兩個數據集上做測試並且實現了平均0.885的F1得分。這個網絡學習去發現實際的缺陷,盡管在訓練過程中沒有缺陷圖片。
Introduction
An anomaly is anything that deviates from the norm. Anomaly detection refers to the task of finding the anomalous instances. Defect detection is a special case of anomaly detection and has applications in industrial settings. Manual inspection by humans is still the norm in most of the industries. The inspection process iscompletely dependent on the visual difference of the anomaly (defect) from the normal background or texture. The process is prone to errors and has several drawbacks, such as training time and cost, human bias and subjectivity, among others. Individual factors such as age, visual acuity, scanning strategy, experience, and training impact the errors caused during the manual inspection process [1]. As a result of these challenges faced in the manual inspection by humans, automation of defect detection has been a topic of research across different application areas such as steel surfaces [2], rail tracks [3] and fabric [4], to name a few. However, all these techniques face two common problems: lack of large labelled data and the limited number of anomalous samples. Semi-supervised techniques try to tackle this challenge. These techniques are based on the assumption that we have access to the labels for only one class type i.e. the normal class [5]. They try to estimate the underlying distribution of the normal samples either implicitly or explicitly. This is followed by the measurement of deviation or divergence of the test samples from this distribution to determine an anomalous sample. To take an example of semi-supervised anomaly detection, Schlegl et al. [6] used Generative Adversarial Networks (GANs) for anomaly detection in optical coherence tomography images of the retina. They trained a GAN on the normal data to learn the underlying distribution of the anatomical variability. But they did not train an encoder for mapping the input image to the latent space. Because of this, the method needed an optimization step for every test image to find a point in the latent space that corresponded to the most visually similar generated image which made it slow. In this research, we explore an auto-encoder based approach that also tries to estimate the distribution of the normal data and then uses residual maps to find the defects. It is described in the next section
簡介
異常是指任何偏離正常的東西. 異常檢測是指發現異常實例化的任務。缺陷檢測是一個特殊的異常檢測而且在工業上有應用。人工檢視在工業應用上仍然是常見的。這個檢視的過程依賴於正常背景和紋理與異常(缺陷)的視覺不同。其他的,工業因素比如年齡,視力靈敏度,掃描策略,經驗,培訓影響了在人工檢測上錯誤的發生。由於這些挑戰面臨着人工檢查,自動的缺陷檢測依據是一個頂尖的研究在不同應用領域如鋼材表面,軌道和布料。舉幾個例子,然而,所有的技術面臨這兩個共同的問題: 缺少大量的標記數據以及有限的缺陷樣本。半監督技術嘗試去處理這個挑戰。這些技術都是假定我們可以訪問一個類類型的標簽,及普通類。他們嘗試去評估正態樣本的潛在分布或潛或顯。以半監督異常檢測為例, Schlegl et al使用對抗生成網絡針對異常檢測在光學相干層析成像中的視網膜圖像。他們訓練一個Gan在通常數據上去學習解剖變異的潛在分布。但是他們沒有訓練一個編碼器去映射一個輸入圖片轉換為潛在空間。因為這樣,這個方法需要一個優化步驟對於每一張測試圖片在潛在空間上發現一個點,並且與視覺上最相似的生成圖片,這使得它變慢。在這個研究中,我們采用了一個基於方法的自動編碼器,這個方法可以評估正常數據的分布,並且使用殘差圖去發現這些缺陷,這將在下一節做解釋。
本文的關鍵優化點:
1. 通過auto-encode網絡,輸入是正常的圖片,輸出也是預測圖片,與輸入圖片構成MSE LOSS, 因為網絡沒有學習到缺陷的樣子,因此當缺陷圖片輸入的時候,預測的結果無法生成有缺陷的位置,因此缺陷圖片減去預測圖片,結果就是缺陷的位置信息。
Method
The proposed network architecture is shown in Figure 1. It is similar to the UNet [7] architecture. The encoder (layers x1 to x5) uses progressively decreasing filter sizes from 11×11 to 3×3. This decreasing filter size is chosen to allow for a larger field of view for the network without having to use large number of smaller size filters. Since deeper networks have a greater tendency to over-fit to the data and have poor generalization. The decoder structure has kernel sizes that are in the reverse of the encoder order and uses Transposed Convolution Layers. The output from the encoder layers is concatenated with the previous layers before passing to layers x7 to x9. For every Conv2D(Transpose) layer the parameters shown are kernel size, stride and number of filters for that layer. After every layer, batch normalization [8] is applied which is followed by the ReLU activation function [9]. For a H ×W input the network outputs a H ×W reconstruction. The network is trained on only the defect-free or normal data samples. Tensorflow 2.0 was used for conducting the experiments. The loss function used was the L2 norm or MSE (Mean Squared Error). The label in this case is the original input image and the prediction is the image reconstructed by the auto-encoder. Adam optimizer [10] was used with default settings. The training was done for 50 epochs.
方法
提出的網絡結構在第一張圖上顯示. 這與UNet的網絡很像,編碼器使用逐步的遞減核大小從11x11到3x3. 這個遞減的過濾器大小可以在網絡中獲得更大的感受眼而不需要使用更多的小尺寸的過濾器。因為更深的網絡對於數據集更容易過擬合並且缺乏通用性。這個解碼器結構與編碼器有着相反的核尺寸並且使用轉置卷積。編碼器的輸出在傳遞到X7和X9的時候與之前的層做連接。每一個Conv2D(轉置)層的參數表示內核的尺寸,該層的步長和過濾器的個數。在接下來的每個層, 歸一化層被應用在激活層函數之后。對於HxW的輸出網絡的輸出一個HxW的重構。這個網絡的訓練只在沒有缺陷或者普通數據樣本上。Tensorflow2.0被用來指導這個實驗。這個損失函數使用L2歸一化或者平均方差損失。本例的輸入在本例中是原始的輸入圖片,預測是通過自動編碼的重構圖片。使用Adam作為默認的設計,訓練經過了50個epoch。
Our hypothesis is that the auto-encoder will learn representations that would only be able to encode and decode the normal samples properly and will not be able to reconstruct the anomalous regions. This shall cause large residuals for the defective regions in the residual map obtained by subtracting the reconstructed image from the input image as shown in Equation 1. The subtraction is done at per pixel-level. This is followed by a thresholding operation to obtain the final defect segmentation.
![]()
where R is the residual, X is the input and AE(X) is the output (reconstructed image) of the auto-encoder. The data-sets used for conducting the experiments are described next.
我們的假設是自動編碼將會學習特征這個只能適用在正常圖片的編解碼上並且不能夠被用來重建缺陷的區域。在殘差圖上,這造成了缺陷部位大面積的殘余,通過輸入圖和重建圖相減的結果在Equation 1. 這個相減針對每一個像素值。通過閾值的操作收集到最終的缺陷區域
這里R是殘差,X是輸入和AE(x)是自動編解碼的輸出(重構圖像)。接下來將描述用於進行試驗的數據集。
上述算法網絡的代碼:
class AnomalyAE(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(1, 48, (11, 11), stride=(1, 1), padding=5) self.bn1 = nn.BatchNorm2d(48) self.conv2 = nn.Conv2d(48, 48, (9, 9), stride=(2, 2), padding=4) self.bn2 = nn.BatchNorm2d(48) self.conv3 = nn.Conv2d(48, 48, (7, 7), stride=(2, 2), padding=3) self.bn3 = nn.BatchNorm2d(48) self.conv4 = nn.Conv2d(48, 48, (5, 5), stride=(2, 2), padding=2) self.bn4 = nn.BatchNorm2d(48) self.conv5 = nn.Conv2d(48, 48, (3, 3), stride=(2, 2), padding=1) self.bn5 = nn.BatchNorm2d(48) self.conv_tr1 = nn.ConvTranspose2d( 48, 48, (5, 5), stride=(2, 2), padding=2, output_padding=1) self.bn_tr1 = nn.BatchNorm2d(48) self.conv_tr2 = nn.ConvTranspose2d( 96, 48, (7, 7), stride=(2, 2), padding=3, output_padding=1) self.bn_tr2 = nn.BatchNorm2d(48) sself.conv_tr3 = nn.ConvTranspose2d( 96, 48, (9, 9), stride=(2, 2), padding=4, output_padding=1) self.bn_tr3 = nn.BatchNorm2d(48) self.conv_tr4 = nn.ConvTranspose2d( 96, 48, (11, 11), stride=(2, 2), padding=5, output_padding=1) self.bn_tr4 = nn.BatchNorm2d(48) self.conv_output = nn.Conv2d(96, 1, (1, 1), (1, 1)) self.bn_output = nn.BatchNorm2d(1) def forward(self, x): slope = 0.2 x = F.leaky_relu((self.bn1(self.conv1(x))), slope) x1 = F.leaky_relu((self.bn2(self.conv2(x))), slope) x2 = F.leaky_relu((self.bn3(self.conv3(x1))), slope) x3 = F.leaky_relu((self.bn4(self.conv4(x2))), slope) x4 = F.leaky_relu((self.bn5(self.conv5(x3))), slope) x5 = F.leaky_relu(self.bn_tr1(self.conv_tr1(x4)), slope) x6 = F.leaky_relu(self.bn_tr2( self.conv_tr2(torch.cat([x5, x3], 1))), slope) x7 = F.leaky_relu(self.bn_tr3( self.conv_tr3(torch.cat([x6, x2], 1))), slope) x8 = F.leaky_relu(self.bn_tr4( self.conv_tr4(torch.cat([x7, x1], 1))), slope) output = F.leaky_relu(self.bn_output( self.conv_output(torch.cat([x8, x], 1))), slope) return output
上述的損失函數使用的是MSE loss
loss = F.mse_loss
