CNN tflearn處理mnist圖像識別代碼解說——conv_2d參數解釋,整個網絡的訓練,主要就是為了學那個卷積核啊。


官方參數解釋:

Convolution 2D

tflearn.layers.conv.conv_2d (incoming, nb_filter, filter_size, strides=1, padding='same', activation='linear', bias=True, weights_init='uniform_scaling', bias_init='zeros', regularizer=None, weight_decay=0.001, trainable=True, restore=True, reuse=False, scope=None, name='Conv2D')

Input

4-D Tensor [batch, height, width, in_channels].

Output

4-D Tensor [batch, new height, new width, nb_filter].

Arguments

  • incoming: Tensor. Incoming 4-D Tensor.
  • nb_filter: int. The number of convolutional filters.
  • filter_size: int or list of int. Size of filters.
  • strides: 'intor list ofint`. Strides of conv operation. Default: [1 1 1 1].
  • padding: str from "same", "valid". Padding algo to use. Default: 'same'.
  • activation: str (name) or function (returning a Tensor) or None. Activation applied to this layer (see tflearn.activations). Default: 'linear'.
  • bias: bool. If True, a bias is used.
  • weights_init: str (name) or Tensor. Weights initialization. (see tflearn.initializations) Default: 'truncated_normal'.
  • bias_init: str (name) or Tensor. Bias initialization. (see tflearn.initializations) Default: 'zeros'.
  • regularizer: str (name) or Tensor. Add a regularizer to this layer weights (see tflearn.regularizers). Default: None.
  • weight_decay: float. Regularizer decay parameter. Default: 0.001.
  • trainable: bool. If True, weights will be trainable.
  • restore: bool. If True, this layer weights will be restored when loading a model.
  • reuse: bool. If True and 'scope' is provided, this layer variables will be reused (shared).
  • scope: str. Define this layer scope (optional). A scope can be used to share variables between layers. Note that scope will override name.
  • name: A name for this layer (optional). Default: 'Conv2D'.

 

代碼:

     # 64 filters
     net = tflearn.conv_2d(net, 64 , 3 , activation = 'relu' )
按照我的理解:

其中的filter(卷積核)就是
[1 0 1 
 0 1 0
 1 0 1],size=3
因為設置了64個filter,那么卷積操作后有64個卷積結果作為輸入的特征(feature map)。難道后面激活函數就是因為選擇部分激活???


圖的原文:http://cs231n.github.io/convolutional-networks/

 

如果一個卷積層有4個feature map,那是不是就有4個卷積核?
是的。


這4個卷積核如何定義?
通常是隨機初始化再用BP算梯度做訓練。如果數據少或者沒有labeled data的話也可以考慮用K-means的K個中心點,逐層做初始化。

卷積核是學習的。卷積核是因為權重的作用方式跟卷積一樣,所以叫卷積層,其實你還是可以把它看成是一個parameter layer,需要更新的。
這四個卷積核就屬於網絡的參數,然后通過BP進行訓練
整個網絡的訓練,主要就是為了學那個卷積核啊。
先初始化一個,之后BP調整,你可以去看看caffe的源碼



--------------------------------------------------------------------------------------------------
下面內容摘自:http://blog.csdn.net/bugcreater/article/details/53293075
  1. from __future__ import division, print_function, absolute_import  
  2.   
  3. import tflearn  
  4. from tflearn.layers.core import input_data, dropout, fully_connected  
  5. from tflearn.layers.conv import conv_2d, max_pool_2d  
  6. from tflearn.layers.normalization import local_response_normalization  
  7. from tflearn.layers.estimator import regression  
  8. #加載大名頂頂的mnist數據集(http://yann.lecun.com/exdb/mnist/)  
  9. import tflearn.datasets.mnist as mnist  
  10. X, Y, testX, testY = mnist.load_data(one_hot=True)  
  11. X = X.reshape([-1, 28, 28, 1])  
  12. testX = testX.reshape([-1, 28, 28, 1])  
  13.   
  14. network = input_data(shape=[None, 28, 28, 1], name='input')  
  15. # CNN中的卷積操作,下面會有詳細解釋  
  16. network = conv_2d(network, 32, 3, activation='relu', regularizer="L2")  
  17. # 最大池化操作  
  18. network = max_pool_2d(network, 2)  
  19. # 局部響應歸一化操作  
  20. network = local_response_normalization(network)  
  21. network = conv_2d(network, 64, 3, activation='relu', regularizer="L2")  
  22. network = max_pool_2d(network, 2)  
  23. network = local_response_normalization(network)  
  24. # 全連接操作  
  25. network = fully_connected(network, 128, activation='tanh')  
  26. # dropout操作  
  27. network = dropout(network, 0.8)  
  28. network = fully_connected(network, 256, activation='tanh')  
  29. network = dropout(network, 0.8)  
  30. network = fully_connected(network, 10, activation='softmax')  
  31. # 回歸操作  
  32. network = regression(network, optimizer='adam', learning_rate=0.01,  
  33.                      loss='categorical_crossentropy', name='target')  
  34.   
  35. # Training  
  36. # DNN操作,構建深度神經網絡  
  37. model = tflearn.DNN(network, tensorboard_verbose=0)  
  38. model.fit({'input': X}, {'target': Y}, n_epoch=20,  
  39.            validation_set=({'input': testX}, {'target': testY}),  
  40.            snapshot_step=100, show_metric=True, run_id='convnet_mnist')  

 


關於conv_2d函數,在源碼里是可以看到總共有14個參數,分別如下:

1.incoming: 輸入的張量,形式是[batch, height, width, in_channels]
2.nb_filter: filter的個數
3.filter_size: filter的尺寸,是int類型
4.strides: 卷積操作的步長,默認是[1,1,1,1]
5.padding: padding操作時標志位,"same"或者"valid",默認是“same”
6.activation: 激活函數(ps:這里需要了解的知識很多,會單獨講)
7.bias: bool量,如果True,就是使用bias
8.weights_init: 權重的初始化
9.bias_init: bias的初始化,默認是0,比如眾所周知的線性函數y=wx+b,其中的w就相當於weights,b就是bias
10.regularizer: 正則項(這里需要講解的東西非常多,會單獨講)
11.weight_decay: 權重下降的學習率
12.trainable: bool量,是否可以被訓練
13.restore: bool量,訓練的模型是否被保存
14.name: 卷積層的名稱,默認是"Conv2D"
 
        
關於max_pool_2d函數,在源碼里有5個參數,分別如下:
1.incoming ,類似於conv_2d里的incoming
2.kernel_size:池化時核的大小,相當於conv_2d時的filter的尺寸
3.strides:類似於conv_2d里的strides
4.padding:同上
5.name:同上
 
        
看了這么多參數,好像有些迷糊,我先用一張圖解釋下每個參數的意義。
 
        

其中的filter就是
[1 0 1 
 0 1 0
 1 0 1],size=3,由於每次移動filter都是一個格子,所以strides=1.
 
        
關於最大池化可以看看下面這張圖,這里面 strides=1,kernel_size =2(就是每個顏色塊的大小),圖中示意的最大池化(可以提取出顯著信息,比如在進行文本分析時可以提取一句話里的關鍵字,以及圖像處理中顯著顏色,紋理等),關於池化這里多說一句,有時需要平均池化,有時需要最小池化。

下面說說其中的padding操作,做圖像處理的人對於這個操作應該不會陌生,說白了,就是填充。比如你對圖像做卷積操作,比如你用的3×3的卷積核,在進行邊上操作時,會發現卷積核已經超過原圖像,這時需要把原圖像進行擴大,擴大出來的就是填充,基本都填充0。




Convolution Demo. Below is a running demo of a CONV layer. Since 3D volumes are hard to visualize, all the volumes (the input volume (in blue), the weight volumes (in red), the output volume (in green)) are visualized with each depth slice stacked in rows. The input volume is of size W1=5,H1=5,D1=3, and the CONV layer parameters are K=2,F=3,S=2,P=1. That is, we have two filters of size 3×3, and they are applied with a stride of 2. Therefore, the output volume size has spatial size (5 - 3 + 2)/2 + 1 = 3. Moreover, notice that a padding of P=1

 

 
        

General pooling. In addition to max pooling, the pooling units can also perform other functions, such as average pooling or even L2-norm pooling. Average pooling was often used historically but has recently fallen out of favor compared to the max pooling operation, which has been shown to work better in practice.

 
        
Pooling layer downsamples the volume spatially, independently in each depth slice of the input volume. Left: In this example, the input volume of size [224x224x64] is pooled with filter size 2, stride 2 into output volume of size [112x112x64]. Notice that the volume depth is preserved. Right: The most common downsampling operation is max, giving rise to max pooling, here shown with a stride of 2. That is, each max is taken over 4 numbers (little 2x2 square).
 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM