《神經網絡與深度學習》第一章使用神經網絡來識別手寫數字（三）- 用Python代碼實現

本文轉載自查看原文 2016-09-09 01:12 4978

實現我們分類數字的網絡

好，讓我們使用隨機梯度下降和 MNIST訓練數據來寫一個程序來學習怎樣識別手寫數字。我們用Python (2.7) 來實現。只有 74 行代碼！我們需要的第一個東西是 MNIST數據。如果有 github 賬號，你可以將這些代碼庫克隆下來，

git clone https://github.com/mnielsen/neural-networks-and-deep-learning.git

或者你可以到這里下載。

順便說一下，當我先前說到 MNIST 數據集時，我說它被分成 60,000 個訓練圖片，和 10,000張測試圖片。這是官方的說法。實際上，我們准備用不同的分法。我們將這60,000張圖片的MNIST訓練數據集分成兩部分：一部分有50,000 張圖片，我們用這些圖片來訓練我們的神經網絡，另外的10,000 張的圖片用來作確認數據集，用來驗證識別是否准確。在這一章節我們不會使用確認數據，在本系列文章的后面，我們會發現它對於計算出怎樣設置神經網絡的hyper-parameters是很有用的 - 例如學習率等等，我們的學習算法中可能不會直接用到這些hyper-parameters。雖然確認數據不是源MNIST規格的一部分，很多人按這種方式使用MNIST，確認數據的使用在神經網絡中是很常見的。當我提到"MNIST" 從現在起，它表示我們的 50,000個圖片數據集，而不是原來的 60,000 張圖片數據集*早前提到的， MNIST數據集基於NIST收集的兩種數據。為了構建MNIST，數據集被NIST 的Yann LeCun, Corinna Cortes和 Christopher J. C. Burges幾個人拆開，放進更方便的格式點擊此鏈接查看更多詳情。在我數據集中的數據集是以一種容易加載的格式出現的，並且是用Python來處理這些 MNIST 數據。我是從Montreal大學LISA 機器學習實驗室 (鏈接)獲得這些特定格式的數據的。

除了MNIST數據，我們還需要一個Python庫Numpy，用來做快速線性代數運算。如果你還沒安裝這個庫，你可以到這里下載： here

讓我們講述一下神經網絡代碼的核心功能，在我給出完整清單前。核心是一個 Network 類，我們用了表現一個神經網絡。下面這些代碼是用來初始化一個Network對象：

class Network(object):

    def __init__(self, sizes):
        self.num_layers = len(sizes)
        self.sizes = sizes
        self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
        self.weights = [np.random.randn(y, x) 
                        for x, y in zip(sizes[:-1], sizes[1:])]

這些代碼，列表的 sizes 包含各個層的神經元的數量。例如，如果我們想創建一個第一層有有兩個神經元，第二層有3個神經元，最后一層有一個神經元的 Network對象，我們這樣設置：

net = Network([2, 3, 1])

也要注意偏移量和權重以Numpy數據矩陣的方式存儲。因此，例如 net.weights[1]是一個Numpy矩陣用來儲存連接第二層和第三層神經網絡的權重。(它不是第一次和第二層，因為Python List 是從0開始算起的）。既然 net.weights[1] 是相當冗長的，讓我們用矩陣

a' = σ (w a + b) (22)

以上記住之后，很容易寫出代碼來計算網絡的輸出。我先定義S型函數開始：

def sigmoid(z):
    return 1.0/(1.0+np.exp(-z))

我們添加一個 feedforward 方法到 Network 類，給神經網絡一個輸入 a ，返回對應的輸入*。加入輸入值 a 是一個 (n, 1)Numpy ndarray，不是一個 (n,) 向量。這里， n 是神經網絡輸入的數字。如果你嘗試使用一個 (n,) 向量作為輸入，你會得到一個奇怪的結果。雖然使用(n,)向量看起來是一個更自然的選擇，但是使用 (n, 1) ndarray可以讓代碼改為前饋一次性多輸入更加容易，有時候很方便。所有這些方法都是應用方程 (22) 到每一層：

 def feedforward(self, a):
        """Return the output of the network if "a" is input."""
        for b, w in zip(self.biases, self.weights):
            a = sigmoid(np.dot(w, a)+b)
        return a

當然，我們想讓我們的Network對象做得主要事情是去學習。為了達到這個目的，我們給它們一個SGD方法，這個方法實現了隨機梯度下降算法。這里是它的代碼。它在有些地方有點神秘，但我會分成一個個小點來解釋。

 def SGD(self, training_data, epochs, mini_batch_size, eta,
            test_data=None):
        """Train the neural network using mini-batch stochastic
        gradient descent.  The "training_data" is a list of tuples
        "(x, y)" representing the training inputs and the desired
        outputs.  The other non-optional parameters are
        self-explanatory.  If "test_data" is provided then the
        network will be evaluated against the test data after each
        epoch, and partial progress printed out.  This is useful for
        tracking progress, but slows things down substantially."""
        if test_data: n_test = len(test_data)
        n = len(training_data)
        for j in xrange(epochs):
            random.shuffle(training_data)
            mini_batches = [
                training_data[k:k+mini_batch_size]
                for k in xrange(0, n, mini_batch_size)]
            for mini_batch in mini_batches:
                self.update_mini_batch(mini_batch, eta)
            if test_data:
                print "Epoch {0}: {1} / {2}".format(
                    j, self.evaluate(test_data), n_test)
            else:
                print "Epoch {0} complete".format(j)

training_data 是一個元組(x, y)列表，代表訓練數據輸入和 相應想要的輸出。變量epochs 和mini_batch_size 是你期望的 - 訓練次數，當取樣時用到的最小批次。 eta是學習率，

這段代碼的作用如下。在每個時期，它會將訓練數據隨機洗牌，然后分成適當的幾批訓練數據。這是將訓練數據隨機抽樣的一種簡單方式。然后對於每一個mini_batch，我們做一次梯度下降。這由代碼self.update_mini_batch(mini_batch, eta)來完成，這段代碼通過使用mini_batch的訓練數據做一次隨機下降循環更新網絡的偏移量和權重。下面是update_mini_batch 方法的代碼：

 def update_mini_batch(self, mini_batch, eta):
        """Update the network's weights and biases by applying
        gradient descent using backpropagation to a single mini batch.
        The "mini_batch" is a list of tuples "(x, y)", and "eta"
        is the learning rate."""
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        for x, y in mini_batch:
            delta_nabla_b, delta_nabla_w = self.backprop(x, y)
            nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
            nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
        self.weights = [w-(eta/len(mini_batch))*nw 
                        for w, nw in zip(self.weights, nabla_w)]
        self.biases = [b-(eta/len(mini_batch))*nb 
                       for b, nb in zip(self.biases, nabla_b)]

delta_nabla_b, delta_nabla_w = self.backprop(x, y)

我不准備現在展示 self.backprop 的代碼。在下一個章節我會介紹反向傳播怎樣學習，包括 self.backprop的代碼。現在，我們假設它能表現的如它聲稱的那樣返回恰當的訓練樣本x的代價Cost梯度。

讓我們看一下整個程序，包括文檔注釋，上面我省略了很多東西。除了self.backprop，這個程序是自解釋的（ self-explanatory ）- 我們上面已經提到過，所有的累活都在self.SGD和self.update_mini_batch里面給你完成好了。 self.backprop方法利用一些額外的函數來幫助計算梯度，例如sigmoid_prime方法是用來計算

"""
network.py
~~~~~~~~~~

A module to implement the stochastic gradient descent learning
algorithm for a feedforward neural network.  Gradients are calculated
using backpropagation.  Note that I have focused on making the code
simple, easily readable, and easily modifiable.  It is not optimized,
and omits many desirable features.
"""

#### Libraries
# Standard library
import random

# Third-party libraries
import numpy as np

class Network(object):

    def __init__(self, sizes):
        """The list ``sizes`` contains the number of neurons in the
        respective layers of the network.  For example, if the list
        was [2, 3, 1] then it would be a three-layer network, with the
        first layer containing 2 neurons, the second layer 3 neurons,
        and the third layer 1 neuron.  The biases and weights for the
        network are initialized randomly, using a Gaussian
        distribution with mean 0, and variance 1.  Note that the first
        layer is assumed to be an input layer, and by convention we
        won't set any biases for those neurons, since biases are only
        ever used in computing the outputs from later layers."""
        self.num_layers = len(sizes)
        self.sizes = sizes
        self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
        self.weights = [np.random.randn(y, x)
                        for x, y in zip(sizes[:-1], sizes[1:])]

    def feedforward(self, a):
        """Return the output of the network if ``a`` is input."""
        for b, w in zip(self.biases, self.weights):
            a = sigmoid(np.dot(w, a)+b)
        return a

    def SGD(self, training_data, epochs, mini_batch_size, eta,
            test_data=None):
        """Train the neural network using mini-batch stochastic
        gradient descent.  The ``training_data`` is a list of tuples
        ``(x, y)`` representing the training inputs and the desired
        outputs.  The other non-optional parameters are
        self-explanatory.  If ``test_data`` is provided then the
        network will be evaluated against the test data after each
        epoch, and partial progress printed out.  This is useful for
        tracking progress, but slows things down substantially."""
        if test_data: n_test = len(test_data)
        n = len(training_data)
        for j in xrange(epochs):
            random.shuffle(training_data)
            mini_batches = [
                training_data[k:k+mini_batch_size]
                for k in xrange(0, n, mini_batch_size)]
            for mini_batch in mini_batches:
                self.update_mini_batch(mini_batch, eta)
            if test_data:
                print "Epoch {0}: {1} / {2}".format(
                    j, self.evaluate(test_data), n_test)
            else:
                print "Epoch {0} complete".format(j)

    def update_mini_batch(self, mini_batch, eta):
        """Update the network's weights and biases by applying
        gradient descent using backpropagation to a single mini batch.
        The ``mini_batch`` is a list of tuples ``(x, y)``, and ``eta``
        is the learning rate."""
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        for x, y in mini_batch:
            delta_nabla_b, delta_nabla_w = self.backprop(x, y)
            nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
            nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
        self.weights = [w-(eta/len(mini_batch))*nw
                        for w, nw in zip(self.weights, nabla_w)]
        self.biases = [b-(eta/len(mini_batch))*nb
                       for b, nb in zip(self.biases, nabla_b)]

    def backprop(self, x, y):
        """Return a tuple ``(nabla_b, nabla_w)`` representing the
        gradient for the cost function C_x.  ``nabla_b`` and
        ``nabla_w`` are layer-by-layer lists of numpy arrays, similar
        to ``self.biases`` and ``self.weights``."""
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        # feedforward
        activation = x
        activations = [x] # list to store all the activations, layer by layer
        zs = [] # list to store all the z vectors, layer by layer
        for b, w in zip(self.biases, self.weights):
            z = np.dot(w, activation)+b
            zs.append(z)
            activation = sigmoid(z)
            activations.append(activation)
        # backward pass
        delta = self.cost_derivative(activations[-1], y) * \
            sigmoid_prime(zs[-1])
        nabla_b[-1] = delta
        nabla_w[-1] = np.dot(delta, activations[-2].transpose())
        # Note that the variable l in the loop below is used a little
        # differently to the notation in Chapter 2 of the book.  Here,
        # l = 1 means the last layer of neurons, l = 2 is the
        # second-last layer, and so on.  It's a renumbering of the
        # scheme in the book, used here to take advantage of the fact
        # that Python can use negative indices in lists.
        for l in xrange(2, self.num_layers):
            z = zs[-l]
            sp = sigmoid_prime(z)
            delta = np.dot(self.weights[-l+1].transpose(), delta) * sp
            nabla_b[-l] = delta
            nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
        return (nabla_b, nabla_w)

    def evaluate(self, test_data):
        """Return the number of test inputs for which the neural
        network outputs the correct result. Note that the neural
        network's output is assumed to be the index of whichever
        neuron in the final layer has the highest activation."""
        test_results = [(np.argmax(self.feedforward(x)), y)
                        for (x, y) in test_data]
        return sum(int(x == y) for (x, y) in test_results)

    def cost_derivative(self, output_activations, y):
        """Return the vector of partial derivatives \partial C_x /
        \partial a for the output activations."""
        return (output_activations-y)

#### Miscellaneous functions
def sigmoid(z):
    """The sigmoid function."""
    return 1.0/(1.0+np.exp(-z))

def sigmoid_prime(z):
    """Derivative of the sigmoid function."""
    return sigmoid(z)*(1-sigmoid(z))

這個程序識別手寫數字的效果有多好？讓我們先加載MNIST訓練數據。我用一個工具程序來幫忙加載，它是 mnist_loader.py，下面介紹一下它。我們在Python shell命令行中輸入下面的命令：

>>> import mnist_loader
>>> training_data, validation_data, test_data = \
... mnist_loader.load_data_wrapper()

當然，這些可以用其它的Python程序來完成，但在 Python shell中執行可能是最容易的方法。

加載了 MNIST 數據之后，我們在導入network模塊，用30個隱藏的神經元來搭建網絡。

>>> import network
>>> net = network.Network([784, 30, 10])

最后，我們會使用隨機梯度下降來學習。用 MNIST training_data 訓練30次， mini-batch是10，學習率為

>>> net.SGD(training_data, 30, 10, 3.0, test_data=test_data)

注意如果你運行上面的代碼，可能會花一點時間來執行 - 一般的電腦 (2015年時期) 會可能花幾分鍾來運行。我建議你先用程序代碼跑一遍再繼續往下看，定期檢查一下代碼的輸出。如果你時間倉促，你可以通過減少訓練次數，或者減少隱藏神經元的數量，又或者只使用小部分訓練數據來加快程序運行。注意實際生產環境的代碼會快很多：這些Python腳本旨在幫助你理解神經網絡的工作原理，並不是高性能的代碼！當然一旦你完成了網絡的訓練，它幾乎在所有計算平台都會運行得非常快。例如我們一旦的網絡訓練好了權重和偏移量，它可以很容易移植到瀏覽器上的網頁用Javascript來運行，或者移動設備的本地app。無論如何，這里只是神經網絡訓練輸出的代碼副本。這個副本展示了測試圖片在每個訓練周期內可以被正確地識別。如你所見，單單一個訓練周期就能識別10,000張圖片中 9,129張圖片，數量還會繼續增長。

Epoch 0: 9129 / 10000
Epoch 1: 9295 / 10000
Epoch 2: 9348 / 10000
...
Epoch 27: 9528 / 10000
Epoch 28: 9542 / 10000
Epoch 29: 9534 / 10000

跟進上面的訓練結果，可以看到訓練后的神經網絡的分類率classification rate大概是

讓我們重新運行上面的試驗，將隱藏神經元的數量改成

>>> net = network.Network([784, 100, 10])
>>> net.SGD(training_data, 30, 10, 3.0, test_data=test_data)

果然，改善后的結果是

當然，為了獲得這些准確性，我必須調整各種訓練的參數，例如訓練次數，最新批次the mini-batch size，和學習率 the learning rate

>>> net = network.Network([784, 100, 10])
>>> net.SGD(training_data, 30, 10, 0.001, test_data=test_data)

結果就很不理想

Epoch 0: 1139 / 10000
Epoch 1: 1136 / 10000
Epoch 2: 1135 / 10000
...
Epoch 27: 2101 / 10000
Epoch 28: 2123 / 10000
Epoch 29: 2142 / 10000

總的來說，調試一個神經網絡可能是一項挑戰，尤其是當初始hyper-parameters參數的結果比隨機的噪音產生的結果要差的時候。假如我們30個神經元的網絡設置學習率為

>>> net = network.Network([784, 30, 10])
>>> net.SGD(training_data, 30, 10, 100.0, test_data=test_data)

Epoch 0: 1009 / 10000
Epoch 1: 1009 / 10000
Epoch 2: 1009 / 10000
Epoch 3: 1009 / 10000
...
Epoch 27: 982 / 10000
Epoch 28: 982 / 10000
Epoch 29: 982 / 10000

現在想象一下，我們第一次遇到這種問題。當然，我們根據之前的試驗將學習率降低才是正確的。但如果第一次遇到這種問題，我們無法根據輸出結果獲知怎么調整參數。我們可能不僅單選學習率，還擔心神經網絡其它方面的參數。我們可能會疑惑是否權重和偏移量的初始值使神經網絡難以訓練？或者我們沒有足夠的訓練數據來進行有意義的學習？還是沒有足夠的訓練次數？或者這種架構的神經網絡不可能適用於識別手寫數字？學習率定得太低或者太高？當你第一次遇到問題，你不確定是什么原因導致的。

這節內容以調試神經網絡結束，調試神經網絡並不是小事，像編程一樣重要，是一門藝術。你需要學會通過調試來使神經網絡獲得良好的輸出結果。一般來說我們需要提高選擇合適的 hyper-parameters 和好架構的探索能力。作者的整本書都會討論這些，包括怎樣選擇合適的hyper-parameters。

練習

嘗試建立一個只有兩層的神經網絡 - 只有輸入和輸出層，沒有隱藏層 - 輸入層784個神經元，輸出層10 個神經元，respectively. 用隨機梯度下降來訓練這個網絡。看看你能達到怎樣的分類精度？

早前，我跳過了，沒有解釋怎樣加載MNIST數據。很直接，為了完整一點，我給出了代碼。用來存儲MNIST 的數據結構在代碼注釋中說的很清楚了- 很直接了當的東西。 Numpy ndarray 對象的元組和列表 (如果你熟悉 ndarray，把它們想象成向量):

"""
mnist_loader
~~~~~~~~~~~~

A library to load the MNIST image data.  For details of the data
structures that are returned, see the doc strings for ``load_data``
and ``load_data_wrapper``.  In practice, ``load_data_wrapper`` is the
function usually called by our neural network code.
"""

#### Libraries
# Standard library
import cPickle
import gzip

# Third-party libraries
import numpy as np

def load_data():
    """Return the MNIST data as a tuple containing the training data,
    the validation data, and the test data.

    The ``training_data`` is returned as a tuple with two entries.
    The first entry contains the actual training images.  This is a
    numpy ndarray with 50,000 entries.  Each entry is, in turn, a
    numpy ndarray with 784 values, representing the 28 * 28 = 784
    pixels in a single MNIST image.

    The second entry in the ``training_data`` tuple is a numpy ndarray
    containing 50,000 entries.  Those entries are just the digit
    values (0...9) for the corresponding images contained in the first
    entry of the tuple.

    The ``validation_data`` and ``test_data`` are similar, except
    each contains only 10,000 images.

    This is a nice data format, but for use in neural networks it's
    helpful to modify the format of the ``training_data`` a little.
    That's done in the wrapper function ``load_data_wrapper()``, see
    below.
    """
    f = gzip.open('../data/mnist.pkl.gz', 'rb')
    training_data, validation_data, test_data = cPickle.load(f)
    f.close()
    return (training_data, validation_data, test_data)

def load_data_wrapper():
    """Return a tuple containing ``(training_data, validation_data,
    test_data)``. Based on ``load_data``, but the format is more
    convenient for use in our implementation of neural networks.

    In particular, ``training_data`` is a list containing 50,000
    2-tuples ``(x, y)``.  ``x`` is a 784-dimensional numpy.ndarray
    containing the input image.  ``y`` is a 10-dimensional
    numpy.ndarray representing the unit vector corresponding to the
    correct digit for ``x``.

    ``validation_data`` and ``test_data`` are lists containing 10,000
    2-tuples ``(x, y)``.  In each case, ``x`` is a 784-dimensional
    numpy.ndarry containing the input image, and ``y`` is the
    corresponding classification, i.e., the digit values (integers)
    corresponding to ``x``.

    Obviously, this means we're using slightly different formats for
    the training data and the validation / test data.  These formats
    turn out to be the most convenient for use in our neural network
    code."""
    tr_d, va_d, te_d = load_data()
    training_inputs = [np.reshape(x, (784, 1)) for x in tr_d[0]]
    training_results = [vectorized_result(y) for y in tr_d[1]]
    training_data = zip(training_inputs, training_results)
    validation_inputs = [np.reshape(x, (784, 1)) for x in va_d[0]]
    validation_data = zip(validation_inputs, va_d[1])
    test_inputs = [np.reshape(x, (784, 1)) for x in te_d[0]]
    test_data = zip(test_inputs, te_d[1])
    return (training_data, validation_data, test_data)

def vectorized_result(j):
    """Return a 10-dimensional unit vector with a 1.0 in the jth
    position and zeroes elsewhere.  This is used to convert a digit
    (0...9) into a corresponding desired output from the neural
    network."""
    e = np.zeros((10, 1))
    e[j] = 1.0
    return e

上面我說過我們的程序獲得了很好的結果。是什么意思呢？這個好是跟什么比較？用一下簡單的 (非神經網絡的) 基准測試來作比較，才能明白這個好是什么意思。這個基准測試當然是隨機猜數字。隨機猜中的准確度是10%。我們用另外一種方法來稍微提高一下准確度。

有沒有更簡單易懂的基准呢？讓我們來嘗試一個非常簡單的想法：比較圖片灰度。例如，一個

建議試用訓練數據來計算每個像素的平均灰度

用上面的方法實現精度達

如果你使用默認設置運行scikit-learn的 SVM 分類器，精度大概是94.35% (代碼在這里 here) 比起上面的利用灰度來分類有天大的改善。事實上這里的 SVM 的性能比神經網絡稍微查一點。在后面的一章我們會引進一種新的技術來改善神經網絡，讓它的性能比SVM出色。

然而，這不是故事的結尾。94.35%這個結果scikit-learn的SVM默認設置時的性能。 SVM有一大堆可調的參數，有可能找到一些參數來提高性能。我不會明確地去做這件事，看這里由 Andreas Mueller寫的這篇博客如果你想了解更多。Mueller給我們演示了通過一些方法來優化SVM的參數，可以將精度提高到98.5%。換句話講，一個好的可調的SVM出錯率大0七十分之一。這非常厲害！神經網絡能做得更好嗎？

事實上，神經網絡可以做得更好。現在，一個設計良好的神經網絡處理MNIST數據方面的精度比其它算法要好，包括SVM。當前時間 (2013年)的記錄的分類的精度達到99.79%( 9,979/10,000)。這是 Li Wan, Matthew Zeiler, Sixin Zhang, Yann LeCun, 和Rob Fergus做到的。在本書的后面，我們會看到他們使用的大多數技術。這個水平的性能已經接近人類的水平了，甚至可能比人類還好一點，因為有少量的MNIST圖片甚至人類都沒有信心識別出來，例如：

我相信你會同意上面這些圖片很難區分！上面這些MNIST圖片， 21 張這樣的圖片放在10,000圖片中神經網絡能准確地識別出來。通常，編程的時候我們相信解決一個諸如識別MNIST圖片數字需要一種深奧的算法。但關於我們在本章節看到算法原型，即使在Wan et al 論文中也提到神經網絡僅涉及一種非常簡單的算法。所有的復雜性在於神經網絡從訓練數據中自動學習。在某種意義上，我們實現的神經網絡和其它更深奧的論文是為了解決以下問題：

深奧的算法

向深度學習邁進

譯者注：最后翻譯進度的時間是：2017-01-11 00:41，我會繼續往下翻譯的：

我們的神經網絡的性能令人印象深刻，性能有點神秘。權重和偏移會自動調整。這意味着我們不能一下子解釋出神經網絡是怎樣做到的。我們可以找到一些方法類理解我們的神經網絡怎樣分類手寫數字的法則嗎？如果有一些法則我們會做得更好嗎？

為了使這個問題更加分明，假定幾十年后神經網絡導致人工智能(AI)出現了。我們可以知道這種智能地神經網絡是怎樣工作的嗎？或許網絡對我們來說是透明的，權重偏移量我們不能理解，因為他們自主學習了。早些時候的AI研究，人們希望建立AI的努力可以幫助我們理解智能背后的法則和人類大腦的運行機理。最后結果可能是我們既不了解大腦的運行機制也不知道人工智能怎么工作！

為了解決這個問題，讓我們回想一下我再第一章開頭提到的人工神經元的解釋，衡量證據的一種手段。假如我們想判斷一張圖片是否是人臉：

我們可以用手寫數字識別的相同方法類解決這個問題 - 使用圖片中的像素作為神經網絡的輸入，一個神經元輸入"是的這是一張臉" 或者 "不是，這不是臉"（這翻譯有點硬）

讓我們假設我們來做這件事，但我們不使用現有的學習算法。我們准備嘗試手動設計一個網絡，選擇合適的權重和偏移量。我們應該怎么做？先把神經網絡的概念完全忘掉，我們可以將問題分解成一個個小問題：圖片左上角有沒有一個眼睛？右上角有沒有一個眼睛？中間有鼻子嗎？下邊中間有沒有一個嘴巴等等。

如上上面的問題的答案是 "yes"，或者很可能是"yes"，那么我們認為這張圖片很可能是一張臉。相反，如果大多數答案都是 "no"，那么圖片很可能不是一張臉。

當然這只是一個粗暴的思維探索，有很多缺陷。也許這個人是光頭，因此沒有頭發。也許我們只能看到半張臉，或者臉的某個角度，因此很多面部特征模糊不清。但這個思維探索表明如果我們用神經網絡來解答這些子問題，通過這些子問題組合形成的網絡，那么很可能我們可以建立一個用於臉部識別的神經網絡。這是大概的架構，用矩形來代表子網絡。注意這不是一個解決面部識別問題的現實中應用的方法；只是一個幫助我們建立神經網絡直覺。這是架構圖：

子網絡貌似也可以分解。假如我們考慮一個問題："左上角有一個眼睛嗎？" 這個問題可以分解為："是否有眼珠？"； "是否有眼睫毛？"; "是否有虹膜？"；以及其它等等。當然這些問題也真的包含位置信息 - "眼珠在左上方，在睫毛的上面？", 諸如此類- 但我們為了保持簡單。網絡只分解一個問題， "左上方是否眼睛？" 現在可以分解成：

這些問題可以通過多層網絡一步步分解。最后我們子網絡可以回答到能從像素級別的回答的問題。這些問題可能，例如在圖片的特定的點上的非常簡單的形狀。這些問題可以用一個連接到圖片像素的神經元來回答。

最后的結果是一個復雜問題的網絡 - (用來判斷圖片是否是一張臉的網絡) - 分解成一個個能在單個像素級別回答的非常簡單的問題。它通過分成很多層來分解問題。前幾層回答圖片輸入的特定的簡單問題，后面的層建立更復雜和抽象的概念。這種多層結構的網絡 - 有兩個或者更多的隱藏層 - 被叫做深度神經網絡。

當然，我沒有說過怎樣遞歸分解成子網絡。當然不是手工來設計權重和偏移量，我們用學習算法來搞，這樣網絡就可以從訓練數據中自動學習調整權重和偏移量了。研究人員在1980和1990年代嘗試使用隨機梯度下降和反向傳播算法來訓練深度網絡。不幸的是，除了少量特殊的架構，其它的就沒有那么幸運得出心儀的結果。網絡會學習，但是太慢，在實踐中沒有多大作用。

2006年以來，一系列可用的深度學習神經網絡的新技術被開發出來。這些深度學習技術也是基於隨機梯度下降算法和反向傳播算法的，但也引入了新的思想。這些技術能夠訓練更深更大型的網絡 - 人們現在通常能訓練有5到10個隱藏層的網絡，性能比原來的淺層網絡（例如只有一個隱藏層的網絡）要好很多。理由當然是深度網絡的能力能建立復雜的概念。這有點像傳統的編程語言使用模塊化設計思想來抽象來構造一個復雜的程序。對比深度網絡和淺層網絡有點像對比有函數封裝概念和沒有函數概念的編程語言。當然神經網絡的抽象和傳統編程的抽象是不同的，只是想說明抽象真的非常重要。

譯者注：至此所有淺層神經網絡部分翻譯都完成了，翻譯完成的時間是：2017-01-22 23:35，接下來將會翻譯有關深度神經網絡和深度學習方面的知識，敬請期待。由於沒太多時間，可能會有翻譯不通順，錯別字等情況，請見諒，后面我會逐步回頭檢查修正，請見諒！

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 《神經網絡與深度學習》：第一章使用神經網絡來識別手寫數字（一）《神經網絡與深度學習》第一章使用神經網絡來識別手寫數字（二）- 用梯度下降來訓練學習 CNN卷積神經網絡_MNIST手寫數字識別代碼實現機器學習——用卷積神經網絡（CNN）實現手寫數字識別 Python實現神經網絡算法識別手寫數字集用Keras實現MNIST手寫數字識別（使用CNN:卷積神經網絡）手寫數字圖片識別-卷積神經網絡 BP神經網絡（手寫數字識別） keras與卷積神經網絡（CNN）實現識別mnist手寫數字 19神經網絡實現手寫識別