visualization of filters keras 基於Keras的卷積神經網絡（CNN）可視化

本文轉載自查看原文 2018-09-09 10:06 5846

https://adeshpande3.github.io/adeshpande3.github.io/

https://blog.csdn.net/weiwei9363/article/details/79112872

https://blog.csdn.net/and_w/article/details/70336506

https://hackernoon.com/visualizing-parts-of-convolutional-neural-networks-using-keras-and-cats-5cc01b214e59

https://keras-cn.readthedocs.io/en/latest/other/visualization/

https://blog.keras.io/category/demo.html

https://stackoverflow.com/questions/39280813/visualization-of-convolutional-layer-in-keras-model

http://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/00-classification.ipynb

https://blog.csdn.net/thystar/article/details/50662972

原始網頁：Visualizing parts of Convolutional Neural Networks using Keras and Cats
翻譯：卷積神經網絡實戰（可視化部分）——使用keras識別貓咪

It is well known that convolutional neural networks (CNNs or ConvNets) have been the source of many major breakthroughs in the field of Deep learning in the last few years, but they are rather unintuitive to reason about for most people. I’ve always wanted to break down the parts of a ConvNet and see what an image looks like after each stage, and in this post I do just that!

在近些年，深度學習領域的卷積神經網絡（CNNs或ConvNets)在各行各業為我們解決了大量的實際問題。但是對於大多數人來說，CNN仿佛戴上了神秘的面紗。我經常會想，要是能將神經網絡的過程分解，看一看每一個步驟是什么樣的結果該有多好！這也就是這篇博客存在的意義。

CNNs at a high level （高級CNN）

First off, what are ConvNets good at? ConvNets are used primarily to look for patterns in an image. You did that by convoluting over an image and looking for patterns. In the first few layers of CNNs the network can identify lines and corners, but we can then pass these patterns down through our neural net and start recognizing more complex features as we get deeper. This property makes CNNs really good at identifying objects in images.

首先，我們要了解一下卷積神經網絡擅長什么。CNN主要被用來找尋圖片中的模式。這個過程主要有兩個步驟，首先要對圖片做卷積，然后找尋模式。在神經網絡中，前幾層是用來尋找邊界和角，隨着層數的增加，我們就能識別更加復雜的特征。這個性質讓CNN非常擅長識別圖片中的物體。

What is a CNN? （CNN是什么？）

A CNN is a neural network that typically contains several types of layers, one of which is a convolutional layer, as well as pooling, and activation layers.

CNN是一種特殊的神經網絡，它包含卷積層、池化層和激活層。

Convolutional Layer (卷積層)

To understand what a CNN is, you need to understand how convolutions work. Imagine you have an image represented as a 5x5 matrix of values, and you take a 3x3 matrix and slide that 3x3 window around the image. At each position the 3x3 visits, you matrix multiply the values of your 3x3 window by the values in the image that are currently being covered by the window. This results in a single number the represents all the values in that window of the image. Here’s a pretty gif for clarity:

要想了解什么是卷積神經網絡，你首先要知道卷積是怎么工作的。想象你有一個5*5矩陣表示的圖片，然后你用一個3*3的矩陣在圖片中滑動。每當3*3矩陣經過的點就用原矩陣中被覆蓋的矩陣和這個矩陣相乘。這樣一來，我們可以使用一個值來表示當前窗口中的所有點。下面是一個過程的動圖：

As you can see, each item in the feature matrix corresponds to a section of the image. Note that the value of the kernel matrix is the red number in the corner of the gif.

正如你所見的那樣，特征矩陣中的每一個項都和原圖中的一個區域相關。

The “window” that moves over the image is called a kernel. Kernels are typically square and 3x3 is a fairly common kernel size for small-ish images. The distance the window moves each time is called the stride. Additionally of note, images are sometimes padded with zeros around the perimeter when performing convolutions, which dampens the value of the convolutions around the edges of the image (the idea being typically the center of photos matter more).

在圖中像窗口一樣移動的叫做核。核一般都是方陣，對於小圖片來說，一般選用3*3的矩陣就可以了。每次窗口移動的距離叫做步長。值得注意的是，一些圖片在邊界會被填充零，如果直接進行卷積運算的話會導致邊界處的數據變小（當然圖片中間的數據更重要）。

The goal of a convolutional layer is filtering. As we move over an image we effective check for patterns in that section of the image. This works because of filters, stacks of weights represented as a vector, which are multiplied by the values outputed by the convolution.When training an image, these weights change, and so when it is time to evaluate an image, these weights return high values if it thinks it is seeing a pattern it has seen before. The combinations of high weights from various filters let the network predict the content of an image. This is why in CNN architecture diagrams, the convolution step is represented by a box, not by a rectangle; the third dimension represents the filters.

卷積層的主要目的是濾波。當我們在圖片上操作時，我們可以很容易得檢查出那部分的模式，這是由於我們使用了濾波，我們用權重向量乘以卷積之后的輸出。當訓練一張圖片時，這些權重會不斷改變，而且當遇到之前見過的模式時，相應的權值會提高。來自各種濾波器的高權重的組合讓網絡預測圖像的內容的能力。這就是為什么在CNN架構圖中，卷積步驟由一個框而不是一個矩形表示; 第三維代表濾波器。

Architecture of AlexNet]

Things to note: (注意事項：)

The output of the convolution is smaller (in width and height) than the original image
A linear function is applied between the kernel and the image window that is under the kernel
Weights in the filters are learned by seeing lots of images

卷積運算后的輸出無論在寬度上還是高度上都比原來的小
核和圖片窗口之間進行的是線性的運算
濾波器中的權重是通過許多圖片學習的

Pooling Layers (池化層)

Pooling works very much like convoluting, where we take a kernel and move the kernel over the image, the only difference is the function that is applied to the kernel and the image window isn’t linear.

池化層和卷積層很類似，也是用一個卷積核在圖上移動。唯一的不同就是池化層中核和圖片窗口的操作不再是線性的。

Max pooling and Average pooling are the most common pooling functions. Max pooling takes the largest value from the window of the image currently covered by the kernel, while average pooling takes the average of all values in the window.

最大池化和平均池化是最常見的池化函數。最大池化選取當前核覆蓋的圖片窗口中最大的數，而平均池化則是選擇圖片窗口的均值。

Activation Layers (激活層)

Activation layers work exactly as in other neural networks, a value is passed through a function that squashes the value into a range. Here’s a bunch of common ones:

在CNN中，激活函數和其他網絡一樣，函數將數值壓縮在一個范圍內。下面列出了一些常見的函數：

The most used activation function in CNNs is the relu (Rectified Linear Unit). There are a bunch of reason that people like relus, but a big one is because they are really cheap to perform, if the number is negative: zero, else: the number. Being cheap makes it faster to train networks.

在CNN中最常用的是relu（修正線性單元）。人們有許多喜歡relu的理由，但是最重要的一點就是它非常的易於實現，如果數值是負數則輸出0，否則輸出本身。這種函數運算簡單，所以訓練網絡也非常快。

Recap (回顧：)

Three main types of layers in CNNs: Convolutional, Pooling, Activation
Convolutional layers multiply kernel value by the image window and optimize the kernel weights over time using gradient descent
Pooling layers describe a window of an image using a single value which is the max or the average of that window
Activation layers squash the values into a range, typically [0,1] or [-1,1]

CNN中主要有三種層，分別是：卷積層、池化層和激活層。
卷積層使用卷積核和圖片窗口相乘，並使用梯度下降法去優化卷積核。
池化層使用最大值或者均值來描述一個圖形窗口。
激活層使用一個激活函數將輸入壓縮到一個范圍中，典型的[0,1][-1,1]。

What does a CNN look like? (CNN是什么樣的呢？)

Before we get into what a CNN looks like, a little bit of background. The first successful applications of ConvNets was by Yann LeCun in the 90’s, he created something called LeNet, that could be used to read hand written numbers. Since then, computing advancements and powerful GPUs have allowed researchers to be more ambitious. In 2010 the Stanford Vision Lab released ImageNet. Image net is data set of 14 million images with labels detailing the contents of the images. It has become one of the research world’s standards for comparing CNN models, with current best models will successfully detect the objects in 94+% of the images. Every so often someone comes in and beats the all time high score on imagenet and its a pretty big deal. In 2014 it was GoogLeNet and VGGNet, before that it was ZF Net. The first viable example of a CNN applied to imagenet was AlexNet in 2012, before that researches attempted to use traditional computer vision techiques, but AlexNet outperformed everything else up to that point by ~15%.

在我們深入了解CNN之前，讓我們先補充一些背景知識。早在上世紀90年代，Yann LeCun就使用CNN做了一個手寫數字識別的程序。而隨着時代的發展，尤其是計算機性能和GPU的改進，研究人員有了更加豐富的想象空間。 2010年斯坦福的機器視覺實驗室發布了ImageNet項目。該項目包含1400萬帶有描述標簽的圖片。這個幾乎已經成為了比較CNN模型的標准。目前，最好的模型在這個數據集上能達到94%的准確率。人們不斷的改善模型來提高准確率。在2014年GoogLeNet 和VGGNet成為了最好的模型，而在此之前是ZFNet。CNN應用於ImageNet的第一個可行例子是AlexNet，在此之前，研究人員試圖使用傳統的計算機視覺技術，但AlexNet的表現要比其他一切都高出15％。

Anyway, lets look at LeNet:

讓我們一起看一下LeNet：

LeNet architecture

This diagram doesn’t show the activation functions, but the architecture is:

這個圖中並沒有顯示激活層，整個的流程是：

Input image →ConvLayer →Relu → MaxPooling →ConvLayer →Relu→ MaxPooling →Hidden Layer →Softmax (activation)→output layer

輸入圖片 →卷積層 →Relu → 最大池化→卷積層 →Relu→ 最大池化→隱藏層 →Softmax (activation)→輸出層。

On to the cats! (讓我們一起看一個實際的例子)

Here is an image of a cat:

下圖是一個貓的圖片：

That’s a good looking cat

Our picture of the cat has a height 320px, a width of 400px, and 3 channels of color (RGB).

這張圖長400像素寬320像素，有三個通道（rgb）的顏色。

Convolutional Layer

So what does he look like after one layer of convolution?

那么經過一層卷積運算之后會變成什么樣子呢？

1 convcat

Here is the cat with a kernel size of 3x3 and 3 filters (if we have more than 3 filter layers we cant plot a 2d image of the cat. Higher dimensional cats are notoriously tricky to deal with.).

這是用一個3*3的卷積核和三個濾波器處理的效果（如果我們有超過3個的濾波器，那么我可以畫出貓的2d圖像。更高維的話就很難處理）

As you can see the cat is really noisy because all of our weights are randomly initialized and we haven’t trained the network. Oh and they’re all on top of each other so even if there was detail on each layer we wouldn’t be able to see it. But we can make out areas of the cat that were the same color like the eyes and the background. What happens if we increase the kernel size to 10x10?

我們可以看到，圖中的貓非常的模糊，因為我們使用了一個隨機的初始值，而且我們還沒有訓練網絡。他們都在彼此的頂端，即使每層都有細節，我們將無法看到它。但我們可以制作出與眼睛和背景相同顏色的貓的區域。如果我們將內核大小增加到10x10，會發生什么呢？

As we can see, we lost some of the detail because the kernel was too big. Also note the shape of the image is slightly smaller because of the larger kernel, and because math governs stuff.

我們可以看到，由於內核太大，我們失去了一些細節。還要注意，從數學角度來看，卷積核越大，圖像的形狀會變得越小。

What happens if we squish it down a bit so we can see the color channels better?

如果我們把它壓扁一點，我們可以更好的看到色彩通道會發生什么？

Much better! Now we can see some of the things our filter is seeing. It looks like red is really liking the black bits of the nose an eyes, and blue is digging the light grey that outlines the cat. We can start to see how the layer captures some of the more important details in the photo.

這張看起來好多了！現在我們可以看到我們的過濾器看到的一些事情。看起來紅色替換掉了黑色的鼻子和黑色眼睛，藍色替換掉了貓邊界的淺灰色。我們可以開始看到圖層如何捕獲照片中的一些更重要的細節。

3x3 Kernel convcat

Original

15x15 pixel kernel size

If we increase the kernel size its far more obvious now that we get less detail, but the image is also smaller than the other two.

如果我們增加內核大小，我們得到的細節就會越來越明顯，當然圖像也比其他兩個都小。

Add an Activation layer (增加一個激活層)

$\reluCat$

We get rid of of a lot of the not blue-ness by adding a relu.

我們通過添加一個relu，去掉了很多不是藍色的部分。

Adding a Pooling Layer (增加一個池化層)

We add a pooling layer (getting rid of the activation just max it a bit easier to show)

我們添加一個池化層（擺脫激活層最大限度地讓圖片更加更容易顯示）。

2x2 pool size

As expected, the cat is blockier, but we can go even blockyier!

正如預期的那樣，貓咪變成了斑駁的，而我們可以讓它更加斑駁。

PoolCat with a 5x5 pool size. All your poolz belong to us

Notice how the image is now about a third the size of the original.

現在圖片大約成了原來的三分之一。

Activation and Max Pooling (激活和最大池化)

LeNet Cats

What do the cats look like if we put them through the convolutional and pools sections of LeNet?

如果我們將貓咪的圖片放到LeNet模型中做卷積和池化，那么效果會怎么樣呢?

1 filter for each conv layer

3 filters in first conv layer, 1 in second conv later

3 filter layers in each convolution

Conclusion

ConvNets are powerful due to their ability to extract the core features of an image and use these features to identify images that contain features like them. Even with our two layer CNN we can start to see the network is paying a lot of attention to regions like the whiskers, nose, and eyes of the cat. These are the types of features that would allow the CNN to differentiate a cat from a bird for example.

ConvNets功能強大，因為它們能夠提取圖像的核心特征，並使用這些特征來識別包含其中的特征的圖像。即使我們的兩層CNN，我們也可以開始看到網絡正在對貓的晶須，鼻子和眼睛這樣的地區給予很多的關注。這些是讓CNN將貓與鳥區分開的特征的類型。

CNNs are remarkably powerful, and while these visualizations aren’t perfect, I hope they can help people like myself who are still learning to reason about ConvNets a little better.

CNN是非常強大的，雖然這些可視化並不完美，但我希望他們能夠幫助像我這樣正在嘗試更好地理解ConvNets的人。

All code is on Github: https://github.com/erikreppel/visualizing_cnns

Follow me on Twitter, I’m @programmer (yes, seriously).

Further Resources

Andrej Karpathy’s cs231n

A guide to convolution arithmetic for deep learning by Vincent Dumoulin and Francesco Visin

卷積神經網絡可視化

本文整理自Deep Learning with Python，書本上完整的代碼在這里的5.4節，並陪有詳細的注釋。
深度學習一直被人們稱為“黑盒子”，即內部算法不可見。但是，卷積神經網絡(CNN)卻能夠被可視化，通過可視化，人們能夠了解CNN識別圖像的過程。
介紹三種可視化方法
1. 卷積核輸出的可視化(Visualizing intermediate convnet outputs (intermediate activations)，即可視化卷積核經過激活之后的結果。能夠看到圖像經過卷積之后結果，幫助理解卷積核的作用
2. 卷積核的可視化(Visualizing convnets filters)，幫助我們理解卷積核是如何感受圖像的
3. 熱度圖可視化(Visualizing heatmaps of class activation in an image)，通過熱度圖，了解圖像分類問題中圖像哪些部分起到了關鍵作用，同時可以定位圖像中物體的位置。

卷積核輸出的可視化(Visualizing intermediate convnet outputs (intermediate activations)

想法很簡單：向CNN輸入一張圖像，獲得某些卷積層的輸出，可視化該輸出
代碼中，使用到了cats_and_dogs_small_2.h5模型，這是在原書5.2節訓練好的模型，當然你完全可以使用keras.applications 中的模型，例如VGG16等。
可視化結果如下圖。
結論：
- 第一層卷積層類似邊緣檢測的功能，在這個階段里，卷積核基本保留圖像所有信息
- 隨着層數的加深，卷積核輸出的內容也越來越抽象，保留的信息也越來越少。
- 越深的層數，越多空白的內容，也就說這些內容空白卷積核沒有在輸入圖像中找到它們想要的特征

卷積核的可視化(Visualizing convnets filters)

卷積核到底是如何識別物體的呢？想要解決這個問題，有一個方法就是去了解卷積核最感興趣的圖像是怎樣的。我們知道，卷積的過程就是特征提取的過程，每一個卷積核代表着一種特征。如果圖像中某塊區域與某個卷積核的結果越大，那么該區域就越“像”該卷積核。
基於以上的推論，如果我們找到一張圖像，能夠使得這張圖像對某個卷積核的輸出最大，那么我們就說找到了該卷積核最感興趣的圖像。
具體思路：輸入一張隨機內容的圖像 $I$
代碼中，使用以及訓練好的VGG16模型，可視化該模型的卷積核。結果如下
- block1_conv1
- block2_conv1
- block3_conv1
- block4_conv1
- block5_conv1
結論：
- 低層的卷積核似乎對顏色，邊緣信息感興趣。
- 越高層的卷積核，感興趣的內容越抽象（非常魔幻啊），也越復雜。
- 高層的卷積核感興趣的圖像越來越難通過梯度上升獲得（block5_conv1有很多還是隨機噪聲的圖像）

熱度圖可視化(Visualizing heatmaps of class activation in an image)

在圖像分類問題中，假設網絡將一張圖片識別成“貓”的概率是0.9，我想了解到底最后一層的卷積層對這0.9的概率的貢獻是多少。換句話時候，假設最后一層卷積層有512個卷積核，我想了解這512個卷積核對該圖片是”貓”分別投了幾票。投票越多的卷積核，就越確信圖片是“貓”，因為它們提取到的特征趨向貓的特征。
代碼中，輸入了一張大象的圖片，然后獲得最后一層卷積層的熱度圖，最后將熱度圖疊加到原圖像，獲得圖像中起到關鍵分類作用的部分。結果如下：

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Keras深度學習之卷積神經網絡（CNN） Keras深度學習之卷積神經網絡（CNN）深度學習：Keras入門(二)之卷積神經網絡(CNN) Keras（四）CNN 卷積神經網絡 RNN 循環神經網絡原理及實例 Keras實現卷積神經網絡卷積神經網絡的簡單可視化用Python可視化卷積神經網絡卷積神經網絡的可視化與理解用Keras搭建神經網絡簡單模版（三）—— CNN 卷積神經網絡（手寫數字圖片識別） keras與卷積神經網絡（CNN）實現識別mnist手寫數字