卷積神經網絡入門


卷積神經網絡入門

多層卷積網絡的基本理論

卷積神經網絡(Convolutional Neural Network,CNN) 是一種前饋神經網絡, 它的人工神經元可以響應一部分覆蓋范圍內的周圍單元,對於大型圖像處理有出色表現。它包括卷積層(alternating convolutional layer)和池層(pooling layer)。

多層卷積網絡的基本可以參看下面這篇博文。

卷積神經網絡的特點

全連接網絡的弊端

  • 參數權值過多:

全連接可知道,前一層神經元數($m$)與后一層神經元數($n$)之間的參數為$m\times n$個。對於圖像來說,一副$1000 \times 1000$像素的圖片,輸入神經元為$10^6$個,如果下一層與之相同,那么兩層之間的權值參數就有$w = 10^12$,這么多的參數將會導致網絡無法訓練。

  • 梯度消失:

全連接網絡的層數太多之后,會出現梯度消失的問題。

  • 關聯性被破壞

針對圖片一類的數據來說,一個像素會與其上下左右等方向的數據有很大的相關性,全連接時,將數據展開后,容易忽略圖片的相關性,或者是將毫無相關的兩個像素強制關聯在一起。

CNN的優勢

正對上面的缺陷,CNN做了如下改進:

  • 局部視野:

由於全連接參數太多,同時也破壞了關聯性。所以采用局部視野的,假設每$10 \times 10$個像素連接到一個神經元,那么結果就變成$w = 100 \times 10^6 = 10^8$ 明顯比全連接降低了4個數量級。這就是卷積的來源

  • 權值共享

上面卷積的方式雖然使得參數下降了幾個數量級,但參數還是很多,那么假設對於$10 \times 10$像素的權值$w$參數都相同,即$10^6$個神經元共享100個權值,那么我們最終的權值參數就降低到了100個!

  • 多核卷積

雖然說權值共享大幅度的減少了參數,對於計算是好事,但是對於模型的學習來說,得到的結果就不一定好,畢竟100個參數能學到的東西也是很有限的。自然而然的也就想到了多用“幾套權值”來學習輸入的特征,這樣既可以保證參數在可以控制的范圍內,也可以從不懂的角度來理解輸入數據。 每一套$w$學習到的神經元結果組成了一個特征圖,從而實現特征的提取。這不同“套”的權值又叫做卷積核(Filter),而通過卷積核提取的結果又叫特征圖(featuremap)。不同的卷積核能學習到不同的特征,以圖片為例,有些能學習到輪廓,有些能學習到顏色,有些能學習到邊角等等。如下:

附件 0.00KB

  • 下采樣

每一套特征圖的輸出結果的量也不少,同時還可能存在一些干擾。這時候可以通過下采樣的方式來減少得到的結果,同時又能保留重要參數。

直觀的認識卷積網絡

以LeNet-5 為例子來直觀的理解CNN的工作方式

LeNet-5

LeNet-5

值得注意的一點是其中sampling 和 pooling會在不同的CNN資料中出現,這都是采樣的叫法。

卷積網絡的符號表示為:CNN抽象圖

進一步理解如何卷積

理論上認識卷積網絡

卷積公式

數學上的卷積公式
連續型

一維卷積 $w(x) * f(x) = \int ^\infty_{-\infty} w(u)f(x-u){\rm d}u$

二維卷積 $w(x, y) * f(x, y) = \int ^\infty_{-\infty} \int ^\infty_{-\infty} w(u, v)f(x-u, y-v){\rm d}u{\rm d}v$

離散型
$w(x) * f(x) = \sum_{u}\sum_{v} w(u,v)f(x-u, y-v)$

關於原點對稱時

$w(x) * f(x) = \sumu_{-u}\sumv_{v} w(u,v)f(x-u, y-v)$

ML中的卷積公式:

二維卷積公式

$a_{i,j} = f(\sum_m \sum_n w_{m,n}* x_{s-m, s-n} + w_b)$

維卷積公式
$a_{i,j} = f(\sum_d \sum_m \sum_n w_{m,n}* x_{s-m, s-n} + w_b)$

Feature map size

有前面二維的的過動態圖,我們可以知道,當進行卷積后,特征圖的大小是多少?怎么計算?這與卷積的大小以及卷積窗口在圖像上移動的步長有關系,上圖中: image size: $5 \times 5$, Filter size: $3 \times 3$, stride(step): 1.
我們會發想最后得到的size 為:$(5-3)/1 \times (5-3)/1 + 1 = 3 \times 3$

有的時候,圖像輸入與Filter移動的步長會出項不滿足,這個時候的解決辦法是在輸入的四周不上n圈0,從而使得Filter發生正確的滑動,記作: $padding=n$。

最后給出feature mape size 的推導公式如下:

寬: $W_2 = (W_1 - F + 2P)/S + 1$
高: $H_2 = (H_1 - F + 2P)/S + 1$
其中 F為filter的size, S為stride的size。

定於卷積公式

$C_{s,t} = \sum^{m_a - 1}{m=0}\sum^{n_a - 1}{n} W_{m,n} X_{s-m, t-n} s.t 0\leq s < m_a + m_b - 1 and 0 \leq t < n_a + n_b - 1 $

其中 $m_a$、$n_a$是$W$的行列
其中 $n_b$、$n_b$是$X$的行列

矩陣定義:$C_{st} = W * X$

需要注意的是在一些資料中會有旋轉180之說,這是因為$s-m, t-n$這里是由大到小的進行乘的,矩陣中可以看做是W旋轉180度之后與x做的互相關操作
簡單解釋一下: $C_{st} = AB$可以與互相關互相轉換, $D = BA $

池化與全連接

見上圖。

Backpropagation

略.感興趣么我再講。

ReLu 以及 sigmoid

兩個激活函數:

附件 0.00KB
附件 0.00KB

代碼實現

對理論有了一定的理解后,我們現在開始進入編程階段。我們使用深度學習庫tensorflow來實現,當然你也可以使用其他的庫例如:caffe、theano、deeplearning4j、torch、kears...

首先導入tensorflow的相關包。

構建一個多層卷積網絡

在前面一個筆記中的手寫識別大概在91%左右,正確率並不高。但是我們將學習卷積神經網絡來改善效果。准確率會比前面的高好多。

為了創建這個模型,我們需要創建大量的權重和偏置項。這個模型中的權重在初始化時應該加入少量的噪聲來打破對稱性以及避免0梯度。由於我們使用的是ReLU神經元,因此比較好的做法是用一個較小的正數來初始化偏置項,以避免神經元節點輸出恆為0的問題(dead neurons )。

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

開始加載MNIST數據,這是已經處理過的,使用Numpy array保存的數據,可以直接使用。

mnist = input_data.read_data_sets('./data/MNIST_data/', one_hot=True)
Extracting ./data/MNIST_data/train-images-idx3-ubyte.gz
Extracting ./data/MNIST_data/train-labels-idx1-ubyte.gz
Extracting ./data/MNIST_data/t10k-images-idx3-ubyte.gz
Extracting ./data/MNIST_data/t10k-labels-idx1-ubyte.gz

由於tensorflow是圖計算的方式,所以需要先定義計算結構,tensorflow才能運行。

所以下面先使用占位符(placeholder)對輸入進行定義

x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])

工欲善其事必先利其器。在開始卷積網絡之前,我們先做兩個函數定義,用來初始化我們的權值。

def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)
    
def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)  

卷積和池化

TensorFlow在卷積和池化上有很強的靈活性。我們怎么處理邊界?步長應該設多大?在這個實例里,我們會一直使用vanilla版本。我們的卷積使用1步長(stride size),0邊距(padding size) 的模板, 保證輸出和輸入是同一個大小。我們的池化用簡單傳統的2x2大小的模板做max pooling。為了代碼更簡潔,我們把這部分抽象成一個函數。

def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

第一層卷積

定義了卷積方式之后,我們就可以來實現CNN的第一層了。它是有一個卷積接一個池化來完成的。卷積在每個5x5的patch中算出32個特征,卷積的權重張量形狀是[5, 5, 1, 32],前兩個維度是patch大小,接着是輸入的通道數目,,最后是輸出的通道數目。同時每個通道還有一個偏置量。

W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

#為了用這一層,我們把x變成一個4d向量,其第2、第3維對應圖片的寬、高,最后一維代表圖片的顏色通道數(因為是灰度圖所以這里的通道數為1,如果是rgb彩色圖,則為3)。
x_image = tf.reshape(x, [-1, 28, 28, 1])

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

第二層卷積

為了構建一個更深的網絡,我們會把幾個類似的層堆疊起來。第二層中,每個5x5的patch會得到64個特征。

W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

密集層連接

現在,圖片尺寸減小到7x7,我們加入一個有1024個神經元的全連接層,用於處理整個圖片。我們把池化層輸出的張量reshape成一些向量,乘上權重矩陣,加上偏置,然后對其使用ReLU。

W_fc1 = weight_variable([7*7*64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

Dropout

為了減少過擬合,我們在輸出層之前加入dropout。我們用一個placeholder來代表一個神經元的輸出在dropout中保持不變的概率。這樣我們可以在訓練過程中啟用dropout,在測試過程中關閉dropout。 TensorFlow的tf.nn.dropout操作除了可以屏蔽神經元的輸出外,還會自動處理神經元輸出值的scale。所以用dropout的時候可以不用考慮scale。

keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob=keep_prob)

輸出層

最后添加一個softmax層。

W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

模型的訓練

下面定義損失函數以及訓練模型

cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))

train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))

accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))


with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    
    for i in range(20000):
        batch  = mnist.train.next_batch(50)
        if i % 100 == 0:
            train_accuracy = accuracy.eval(feed_dict= {x: batch[0], y_: batch[1], keep_prob: 1.0})
            print('step %d, training accuracy %g' % (i, train_accuracy))
            
        train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
        
    print('test accuracy %g' % accuracy.eval(feed_dict={
                x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))
step 0, training accuracy 0.12
step 100, training accuracy 0.8
step 200, training accuracy 0.94
step 300, training accuracy 0.86
step 400, training accuracy 0.98
step 500, training accuracy 0.92
step 600, training accuracy 0.92
step 700, training accuracy 0.9
step 800, training accuracy 0.94
step 900, training accuracy 0.9
step 1000, training accuracy 0.96
step 1100, training accuracy 0.92
step 1200, training accuracy 0.94
step 1300, training accuracy 0.94
step 1400, training accuracy 0.94
step 1500, training accuracy 1
step 1600, training accuracy 0.94
step 1700, training accuracy 0.98
step 1800, training accuracy 0.98
step 1900, training accuracy 1
step 2000, training accuracy 0.88
step 2100, training accuracy 0.96
step 2200, training accuracy 0.98
step 2300, training accuracy 1
step 2400, training accuracy 0.96
step 2500, training accuracy 0.96
step 2600, training accuracy 1
step 2700, training accuracy 0.98
step 2800, training accuracy 1
step 2900, training accuracy 1
step 3000, training accuracy 0.98
step 3100, training accuracy 0.98
step 3200, training accuracy 1
step 3300, training accuracy 1
step 3400, training accuracy 1
step 3500, training accuracy 1
step 3600, training accuracy 1
step 3700, training accuracy 1
step 3800, training accuracy 0.98
step 3900, training accuracy 0.98
step 4000, training accuracy 0.98
step 4100, training accuracy 1
step 4200, training accuracy 0.98
step 4300, training accuracy 1
step 4400, training accuracy 1
step 4500, training accuracy 0.98
step 4600, training accuracy 1
step 4700, training accuracy 0.98
step 4800, training accuracy 0.98
step 4900, training accuracy 0.94
step 5000, training accuracy 0.96
step 5100, training accuracy 1
step 5200, training accuracy 1
step 5300, training accuracy 1
step 5400, training accuracy 1
step 5500, training accuracy 1
step 5600, training accuracy 1
step 5700, training accuracy 0.98
step 5800, training accuracy 1
step 5900, training accuracy 1
step 6000, training accuracy 0.98
step 6100, training accuracy 1
step 6200, training accuracy 0.98
step 6300, training accuracy 1
step 6400, training accuracy 1
step 6500, training accuracy 1
step 6600, training accuracy 0.98
step 6700, training accuracy 1
step 6800, training accuracy 0.98
step 6900, training accuracy 1
step 7000, training accuracy 1
step 7100, training accuracy 1
step 7200, training accuracy 1
step 7300, training accuracy 1
step 7400, training accuracy 1
step 7500, training accuracy 1
step 7600, training accuracy 1
step 7700, training accuracy 1
step 7800, training accuracy 1
step 7900, training accuracy 1
step 8000, training accuracy 1
step 8100, training accuracy 1
step 8200, training accuracy 1
step 8300, training accuracy 1
step 8400, training accuracy 1
step 8500, training accuracy 1
step 8600, training accuracy 1
step 8700, training accuracy 1
step 8800, training accuracy 1
step 8900, training accuracy 0.96
step 9000, training accuracy 1
step 9100, training accuracy 1
step 9200, training accuracy 0.98
step 9300, training accuracy 1
step 9400, training accuracy 0.98
step 9500, training accuracy 1
step 9600, training accuracy 1
step 9700, training accuracy 0.98
step 9800, training accuracy 0.98
step 9900, training accuracy 1
step 10000, training accuracy 1
step 10100, training accuracy 1
step 10200, training accuracy 1
step 10300, training accuracy 1
step 10400, training accuracy 1
step 10500, training accuracy 1
step 10600, training accuracy 1
step 10700, training accuracy 1
step 10800, training accuracy 0.98
step 10900, training accuracy 1
step 11000, training accuracy 1
step 11100, training accuracy 1
step 11200, training accuracy 1
step 11300, training accuracy 1
step 11400, training accuracy 1
step 11500, training accuracy 0.98
step 11600, training accuracy 1
step 11700, training accuracy 1
step 11800, training accuracy 1
step 11900, training accuracy 1
step 12000, training accuracy 0.98
step 12100, training accuracy 0.98
step 12200, training accuracy 1
step 12300, training accuracy 1
step 12400, training accuracy 1
step 12500, training accuracy 1
step 12600, training accuracy 1
step 12700, training accuracy 1
step 12800, training accuracy 1
step 12900, training accuracy 1
step 13000, training accuracy 0.98
step 13100, training accuracy 1
step 13200, training accuracy 1
step 13300, training accuracy 1
step 13400, training accuracy 0.98
step 13500, training accuracy 1
step 13600, training accuracy 0.98
step 13700, training accuracy 1
step 13800, training accuracy 1
step 13900, training accuracy 1
step 14000, training accuracy 1
step 14100, training accuracy 1
step 14200, training accuracy 1
step 14300, training accuracy 1
step 14400, training accuracy 1
step 14500, training accuracy 1
step 14600, training accuracy 1
step 14700, training accuracy 1
step 14800, training accuracy 1
step 14900, training accuracy 1
step 15000, training accuracy 1
step 15100, training accuracy 1
step 15200, training accuracy 1
step 15300, training accuracy 1
step 15400, training accuracy 1
step 15500, training accuracy 1
step 15600, training accuracy 1
step 15700, training accuracy 1
step 15800, training accuracy 1
step 15900, training accuracy 0.98
step 16000, training accuracy 1
step 16100, training accuracy 1
step 16200, training accuracy 1
step 16300, training accuracy 1
step 16400, training accuracy 1
step 16500, training accuracy 1
step 16600, training accuracy 1
step 16700, training accuracy 1
step 16800, training accuracy 1
step 16900, training accuracy 1
step 17000, training accuracy 1
step 17100, training accuracy 1
step 17200, training accuracy 1
step 17300, training accuracy 1
step 17400, training accuracy 1
step 17500, training accuracy 1
step 17600, training accuracy 1
step 17700, training accuracy 1
step 17800, training accuracy 1
step 17900, training accuracy 1
step 18000, training accuracy 1
step 18100, training accuracy 1
step 18200, training accuracy 1
step 18300, training accuracy 1
step 18400, training accuracy 1
step 18500, training accuracy 1
step 18600, training accuracy 1
step 18700, training accuracy 1
step 18800, training accuracy 1
step 18900, training accuracy 1
step 19000, training accuracy 1
step 19100, training accuracy 1
step 19200, training accuracy 1
step 19300, training accuracy 1
step 19400, training accuracy 1
step 19500, training accuracy 1
step 19600, training accuracy 1
step 19700, training accuracy 1
step 19800, training accuracy 0.98
step 19900, training accuracy 1
test accuracy 0.9922

well done!


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM