寫給程序員的機器學習入門 (二) - pytorch 與矩陣計算入門

本文轉載自查看原文 2020-04-10 16:58 2280 機器學習入門

pytorch 簡介

pytorch 是目前世界上最流行的兩個機器學習框架的其中之一，與 tensoflow 並峙雙雄。它提供了很多方便的功能，例如根據損失自動微分計算應該怎樣調整參數，提供了一系列的數學函數封裝，還提供了一系列現成的模型，以及把模型組合起來進行訓練的框架。pytorch 的前身是 torch，基於 lua，而 pytorch 基於 python，雖然它基於 python 但底層完全由 c++ 編寫，支持自動並列化計算和使用 GPU 加速運算，所以它的性能非常好。

傳統的機器學習有的會像前一節的例子中全部手寫，或者利用 numpy 類庫減少一部分工作量，也有人會利用 scikit-learn (基於 numpy) 類庫封裝好的各種經典算法。pytorch 與 tensorflow 和傳統機器學習不一樣的是，它們把重點放在了組建類似人腦的神經元網絡 (Neural Network)，所以能實現傳統機器學習無法做到的非常復雜的判斷，例如判斷圖片中的物體類型，自動駕駛等。不過，它們組建的神經元網絡工作方式是不是真的和人腦類似仍然有很多爭議，目前已經有人開始着手組建原理上更接近人腦的 GNN (Graph Neural Network) 網絡，但仍未實用化，所以我們這個系列還是會着重講解當前已經實用化並廣泛應用在各個行業的網絡模型。

學 pytorch 還是學 tensorflow 好？

對初學者來說一個很常見的問題是，學 pytorch 還是學 tensorflow 好？按目前的統計數據來說，公司更多使用 tensorflow，而研究人員更多使用 pytorch，pytorch 的增長速度非常快，有超越 tensorflow 的趨勢。我的意見是學哪個都無所謂，如果你熟悉 pytorch，學 tensorflow 也就一兩天的事情，反過來也一樣，並且 pytorch 和 tensorflow 的項目可以互相移植，選一個覺得好學的就可以了。因為我覺得 pytorch 更好學 (封裝非常直觀，使用 Dynamic Graph 使得調試非常容易)，所以這個系列會基於 pytorch 來講。

Dynamic Graph 與 Static Graph

機器學習框架按運算的流程是否需要預先固定可以分為 Dynamic Graph 和 Static Graph，Dynamic Graph 不需要預先固定運算流程，而 Static Graph 需要。舉例來說，對同一個公式 wx + b = y，Dynamic Graph 型的框架可以把 wx，+b 分開寫並且逐步計算，計算的過程中隨時都可以用 print 等指令輸出途中的結果，或者把途中的結果發送到其他地方記錄起來；而 Static Graph 型的框架必須預先定好整個計算流程，你只能傳入 w, x, b 給計算器，然后讓計算器輸出 y，中途計算的結果只能使用專門的調試器來查看。

一般的來說 Static Graph 性能會比 Dynamic Graph 好，Tensorflow (老版本) 使用的是 Static Graph，而 pytorch 使用的是 Dynamic Graph，但兩者實際性能相差很小，因為消耗資源的大部分都是矩陣運算，使用批次訓練可以很大程度減少它們的差距。順帶一提，Tensorflow 1.7 開始支持了 Dynamic Graph，並且在 2.0 默認開啟，但大部分人在使用 Tensorflow 的時候還是會用 Static Graph。

# Dynamic Graph 的印象，運算的每一步都可以插入自定義代碼
def forward(w, x, b):
    wx = w * x
    print(wx)
    y = wx + b
    print(y)
    return y
forward(w, x, b)

# Static Graph 的印象，需要預先編譯整個計算流程
forward = compile("wx+b")
forward(w, x, b)

安裝 pytorch

假設你已經安裝了 python3，執行以下命令即可安裝 pytorch：

pip3 install pytorch

之后在 python 代碼中使用 import torch 即可引用 pytorch 類庫。

pytorch 的基本操作

接下來我們熟悉一下 pytorch 里面最基本的操作，pytorch 會用 torch.Tensor 類型來統一表現數值，向量 (一維數組) 或矩陣 (多維數組)，模型的參數也會使用這個類型。(tensorflow 會根據用途分為好幾個類型，這點 pytorch 更簡潔明了)

torch.Tensor 類型可以使用 torch.tensor 函數構建，以下是一些簡單的例子（運行在 python 的 REPL 中):

# 引用 pytorch
>>> import torch

# 創建一個整數 tensor
>>> torch.tensor(1)
tensor(1)

# 創建一個小數 tensor
>>> torch.tensor(1.0)
tensor(1.)

# 單值 tensor 中的值可以用 item 函數取出
>>> torch.tensor(1.0).item()
1.0

# 使用一維數組創建一個向量 tensor
>>> torch.tensor([1.0, 2.0, 3.0])
tensor([1., 2., 3.])

# 使用二維數組創建一個矩陣 tensor
>>> torch.tensor([[1.0, 2.0, 3.0], [-1.0, -2.0, -3.0]])
tensor([[ 1.,  2.,  3.],
        [-1., -2., -3.]])

tensor 對象的數值類型可以看它的 dtype 成員：

>>> torch.tensor(1).dtype
torch.int64
>>> torch.tensor(1.0).dtype
torch.float32
>>> torch.tensor([1.0, 2.0, 3.0]).dtype
torch.float32
>>> torch.tensor([[1.0, 2.0, 3.0], [-1.0, -2.0, -3.0]]).dtype
torch.float32

pytorch 支持整數類型 torch.uint8, torch.int8, torch.int16, torch.int32, torch.int64 ，浮點數類型 torch.float16, torch.float32, torch.float64，還有布爾值類型 torch.bool。類型后的數字代表它的位數 (bit 數)，而 uint8 前面的 u 代表它是無符號數 (unsigned)。實際絕大部分場景都只會使用 torch.float32，雖然精度沒有 torch.float64 高但它占用內存小並且運算速度快。注意一個 tensor 對象里面只能保存一種類型的數值，不能混合存放。

創建 tensor 對象時可以通過 dtype 參數強制指定類型：

>>> torch.tensor(1, dtype=torch.int32)
tensor(1, dtype=torch.int32)
>>> torch.tensor([1.1, 2.9, 3.5], dtype=torch.int32)
tensor([1, 2, 3], dtype=torch.int32)

>>> torch.tensor(1, dtype=torch.int64)
tensor(1)

>>> torch.tensor(1, dtype=torch.float32)
tensor(1.)

>>> torch.tensor(1, dtype=torch.float64)
tensor(1., dtype=torch.float64)
>>> torch.tensor([1, 2, 3], dtype=torch.float64)
tensor([1., 2., 3.], dtype=torch.float64)

>>> torch.tensor([1, 2, 0], dtype=torch.bool)
tensor([ True,  True, False])

tensor 對象的形狀可以看它的 shape 成員：

# 整數 tensor 的 shape 為空
>>> torch.tensor(1).shape
torch.Size([])
>>> torch.tensor(1.0).shape
torch.Size([])

# 數組 tensor 的 shape 只有一個值，代表數組的長度
>>> torch.tensor([1.0]).shape
torch.Size([1])
>>> torch.tensor([1.0, 2.0, 3.0]).shape
torch.Size([3])

# 矩陣 tensor 的 shape 根據它的維度而定，每個值代表各個維度的大小，這個例子代表矩陣有 2 行 3 列
>>> torch.tensor([[1.0, 2.0, 3.0], [-1.0, -2.0, -3.0]]).shape
torch.Size([2, 3])

tensor 對象與數值，tensor 對象與 tensor 對象之間可以進行運算：

>>> torch.tensor(1.0) * 2
tensor(2.)
>>> torch.tensor(1.0) * torch.tensor(2.0)
tensor(2.)
>>> torch.tensor(3.0) * torch.tensor(2.0)
tensor(6.)

向量和矩陣還可以批量進行運算（內部會並列化運算）：

# 向量和數值之間的運算
>>> torch.tensor([1.0, 2.0, 3.0])
tensor([1., 2., 3.])
>>> torch.tensor([1.0, 2.0, 3.0]) * 3
tensor([3., 6., 9.])
>>> torch.tensor([1.0, 2.0, 3.0]) * 3 - 1
tensor([2., 5., 8.])

# 矩陣和單值 tensor 對象之間的運算
>>> torch.tensor([[1.0, 2.0, 3.0], [-1.0, -2.0, -3.0]])
tensor([[ 1.,  2.,  3.],
        [-1., -2., -3.]])
>>> torch.tensor([[1.0, 2.0, 3.0], [-1.0, -2.0, -3.0]]) / torch.tensor(2)
tensor([[ 0.5000,  1.0000,  1.5000],
        [-0.5000, -1.0000, -1.5000]])

# 矩陣和與矩陣最后一個維度相同長度向量之間的運算
>>> torch.tensor([[1.0, 2.0, 3.0], [-1.0, -2.0, -3.0]]) * torch.tensor([1.0, 1.5, 2.0])
tensor([[ 1.,  3.,  6.],
        [-1., -3., -6.]])

tensor 對象之間的運算一般都會生成一個新的 tensor 對象，如果你想避免生成新對象 (提高性能)，可以使用 _ 結尾的函數，它們會修改原有的對象：

# 生成新對象，原有對象不變，add 和 + 意義相同
>>> a = torch.tensor([1,2,3])
>>> b = torch.tensor([7,8,9])
>>> a.add(b)
tensor([ 8, 10, 12])
>>> a
tensor([1, 2, 3])

# 在原有對象上執行操作，避免生成新對象
>>> a.add_(b)
tensor([ 8, 10, 12])
>>> a
tensor([ 8, 10, 12])

pytorch 還提供了一系列方便的函數求最大值，最小值，平均值，標准差等:

>>> torch.tensor([1.0, 2.0, 3.0])
tensor([1., 2., 3.])
>>> torch.tensor([1.0, 2.0, 3.0]).min()
tensor(1.)
>>> torch.tensor([1.0, 2.0, 3.0]).max()
tensor(3.)
>>> torch.tensor([1.0, 2.0, 3.0]).mean()
tensor(2.)
>>> torch.tensor([1.0, 2.0, 3.0]).std()
tensor(1.)

pytorch 還支持比較 tensor 對象來生成布爾值類型的 tensor:

# tensor 對象與數值比較
>>> torch.tensor([1.0, 2.0, 3.0]) > 1.0
tensor([False,  True,  True])
>>> torch.tensor([1.0, 2.0, 3.0]) <= 2.0
tensor([ True,  True, False])

# tensor 對象與 tensor 對象比較
>>> torch.tensor([1.0, 2.0, 3.0]) > torch.tensor([1.1, 1.9, 3.0])
tensor([False,  True, False])
>>> torch.tensor([1.0, 2.0, 3.0]) <= torch.tensor([1.1, 1.9, 3.0])
tensor([ True, False,  True])

pytorch 還支持生成指定形狀的 tensor 對象：

# 生成 2 行 3 列的矩陣 tensor，值全部為 0
>>> torch.zeros(2, 3)
tensor([[0., 0., 0.],
        [0., 0., 0.]])

# 生成 3 行 2 列的矩陣 tensor，值全部為 1
torch.ones(3, 2)
>>> torch.ones(3, 2)
tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])

# 生成 3 行 2 列的矩陣 tensor，值全部為 100
>>> torch.full((3, 2), 100)
tensor([[100., 100.],
        [100., 100.],
        [100., 100.]])

# 生成 3 行 3 列的矩陣 tensor，值為范圍 [0, 1) 的隨機浮點數
>>> torch.rand(3, 3)
tensor([[0.4012, 0.2412, 0.1532],
        [0.1178, 0.2319, 0.4056],
        [0.7879, 0.8318, 0.7452]])

# 生成 3 行 3 列的矩陣 tensor，值為范圍 [1, 10] 的隨機整數
>>> (torch.rand(3, 3) * 10 + 1).long()
tensor([[ 8,  1,  5],
        [ 8,  6,  5],
        [ 1,  6, 10]])

# 和上面的寫法效果一樣
>>> torch.randint(1, 11, (3, 3))
tensor([[7, 1, 3],
        [7, 9, 8],
        [4, 7, 3]])

這里提到的操作只是常用的一部分，如果你想了解更多 tensor 對象支持的操作，可以參考以下文檔：

https://pytorch.org/docs/stable/tensors.html

pytorch 保存 tensor 使用的數據結構

為了減少內存占用與提升訪問速度，pytorch 會使用一塊連續的儲存空間 (不管是在系統內存還是在 GPU 內存中) 保存 tensor，不管 tensor 是數值，向量還是矩陣。

我們可以使用 storage 查看 tensor 對象使用的儲存空間：

# 數值的儲存空間長度是 1
>>> torch.tensor(1).storage()
 1
[torch.LongStorage of size 1]

# 向量的儲存空間長度等於向量的長度
>>> torch.tensor([1, 2, 3], dtype=torch.float32).storage()
 1.0
 2.0
 3.0
[torch.FloatStorage of size 3]

# 矩陣的儲存空間長度等於所有維度相乘的結果，這里是 2 行 3 列總共 6 個元素
>>> torch.tensor([[1, 2, 3], [-1, -2, -3]], dtype=torch.float64).storage()
 1.0
 2.0
 3.0
 -1.0
 -2.0
 -3.0
[torch.DoubleStorage of size 6]

pytorch 會使用 stride 來確定一個 tensor 對象的維度：

# 儲存空間有 6 個元素
>>> torch.tensor([[1, 2, 3], [-1, -2, -3]]).storage()
 1
 2
 3
 -1
 -2
 -3
[torch.LongStorage of size 6]

# 第一個維度是 2，第二個維度是 3 (2 行 3 列)
>>> torch.tensor([[1, 2, 3], [-1, -2, -3]]).shape
torch.Size([2, 3])

# stride 的意義是表示每個維度之間元素的距離
# 第一個維度會按 3 個元素來切分 (6 個元素可以切分成 2 組)，第二個維度會按 1 個元素來切分 (3 個元素)
>>> torch.tensor([[1, 2, 3], [-1, -2, -3]])
tensor([[ 1,  2,  3],
        [-1, -2, -3]])
>>> torch.tensor([[1, 2, 3], [-1, -2, -3]]).stride()
(3, 1)

pytorch 的一個很強大的地方是，通過 view 函數可以修改 tensor 對象的維度 (內部改變了 stride)，但是不需要創建新的儲存空間並復制元素：

# 創建一個 2 行 3 列的矩陣
>>> a = torch.tensor([[1, 2, 3], [-1, -2, -3]])
>>> a
tensor([[ 1,  2,  3],
        [-1, -2, -3]])
>>> a.shape
torch.Size([2, 3])
>>> a.stride()
(3, 1)

# 把維度改為 3 行 2 列
>>> b = a.view(3, 2)
>>> b
tensor([[ 1,  2],
        [ 3, -1],
        [-2, -3]])
>>> b.shape
torch.Size([3, 2])
>>> b.stride()
(2, 1)

# 轉換為向量
>>> c = b.view(6)
>>> c
tensor([ 1,  2,  3, -1, -2, -3])
>>> c.shape
torch.Size([6])
>>> c.stride()
(1,)

# 它們的儲存空間是一樣的
>>> a.storage()
 1
 2
 3
 -1
 -2
 -3
[torch.LongStorage of size 6]
>>> b.storage()
 1
 2
 3
 -1
 -2
 -3
[torch.LongStorage of size 6]
>>> c.storage()
 1
 2
 3
 -1
 -2
 -3
[torch.LongStorage of size 6]

使用 stride 確定維度的另一個意義是它可以支持共用同一個空間實現轉置 (Transpose) 操作:

# 創建一個 2 行 3 列的矩陣
>>> a = torch.tensor([[1, 2, 3], [-1, -2, -3]])
>>> a
tensor([[ 1,  2,  3],
        [-1, -2, -3]])
>>> a.shape
torch.Size([2, 3])
>>> a.stride()
(3, 1)

# 使用轉置操作交換維度 (行轉列)
>>> b = a.transpose(0, 1)
>>> b
tensor([[ 1, -1],
        [ 2, -2],
        [ 3, -3]])
>>> b.shape
torch.Size([3, 2])
>>> b.stride()
(1, 3)

# 它們的儲存空間是一樣的
>>> a.storage()
 1
 2
 3
 -1
 -2
 -3
[torch.LongStorage of size 6]
>>> b.storage()
 1
 2
 3
 -1
 -2
 -3
[torch.LongStorage of size 6]

轉置操作內部就是交換了指定維度在 stride 中對應的值，你可以根據前面的描述想想對象在轉置后的矩陣中會如何划分。

現在再想想，如果把轉置后的矩陣用 view 函數專為向量會變為什么？會變為 [1, -1, 2, -2, 3, -3] 嗎？

實際上這樣的操作會導致出錯😱：

>>> b
tensor([[ 1, -1],
        [ 2, -2],
        [ 3, -3]])
>>> b.view(6)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

這是因為轉置后矩陣元素的自然順序和儲存空間中的順序不一致，我們可以用 is_contiguous 函數來檢測：

>>> a.is_contiguous()
True
>>> b.is_contiguous()
False

解決這個問題的方法是首先用 contiguous 函數把儲存空間另外復制一份使得順序一致，然后再用 view 函數改變維度；或者用更方便的 reshape 函數，reshape 函數會檢測改變維度的時候是否需要復制儲存空間，如果需要則復制，不需要則和 view 一樣只修改內部的 stride。

>>> b.contiguous().view(6)
tensor([ 1, -1,  2, -2,  3, -3])
>>> b.reshape(6)
tensor([ 1, -1,  2, -2,  3, -3])

pytorch 還支持截取儲存空間的一部分來作為一個新的 tensor 對象，基於內部的 storage_offset 與 size 屬性，同樣不需要復制：

# 截取向量的例子
>>> a = torch.tensor([1, 2, 3, -1, -2, -3])
>>> b = a[1:3]
>>> b
tensor([2, 3])
>>> b.storage_offset()
1
>>> b.size()
torch.Size([2])
>>> b.storage()
 1
 2
 3
 -1
 -2
 -3
[torch.LongStorage of size 6]

# 截取矩陣的例子
>>> a.view(3, 2)
tensor([[ 1,  2],
        [ 3, -1],
        [-2, -3]])
>>> c = a.view(3, 2)[1:] # 第一維度 (行) 截取 1~結尾, 第二維度不截取
>>> c
tensor([[ 3, -1],
        [-2, -3]])
>>> c.storage_offset()
2
>>> c.size()
torch.Size([2, 2])
>>> c.stride()
(2, 1)
>>> c.storage()
 1
 2
 3
 -1
 -2
 -3
[torch.LongStorage of size 6]

# 截取轉置后矩陣的例子，更復雜一些
>>> a.view(3, 2).transpose(0, 1)
tensor([[ 1,  3, -2],
        [ 2, -1, -3]])
>>> c = a.view(3, 2).transpose(0, 1)[:,1:] # 第一維度 (行) 不截取，第二維度 (列) 截取 1~結尾
>>> c
tensor([[ 3, -2],
        [-1, -3]])
>>> c.storage_offset()
2
>>> c.size()
torch.Size([2, 2])
>>> c.stride()
(1, 2)
>>> c.storage()
 1
 2
 3
 -1
 -2
 -3
[torch.LongStorage of size 6]

好了，看完這一節你應該對 pytorch 如何儲存 tensor 對象有一個比較基礎的了解。為了容易理解本節最多只使用二維矩陣做例子，你可以自己試試更多維度的矩陣是否可以用同樣的方式操作。

矩陣乘法簡介

接下來我們看看矩陣乘法 (Matrix Multiplication)，這是機器學習中最最最頻繁的操作，高中學過並且還記得的就當復習一下吧，

以下是一個簡單的例子，一個 2 行 3 列的矩陣乘以一個 3 行 4 列的矩陣可以得出一個 2 行 4 列的矩陣：

矩陣乘法會把第一個矩陣的每一行與第二個矩陣的每一列相乘的各個合計值作為結果，可以參考下圖理解：

按這個規則來算，一個 n 行 m 列的矩陣和一個 m 行 p 列的矩陣相乘，會得出一個 n 行 p 列的矩陣 (第一個矩陣的列數與第二個矩陣的行數必須相同)。

那矩陣乘法有什么意義呢？矩陣乘法在機器學習中的意義是可以把對多個輸入輸出或者中間值的計算合並到一個操作中 (在數學上也可以大幅簡化公式)，框架可以在內部並列化計算，因為高端的 GPU 有幾千個核心，把計算分布到幾千個核心中可以大幅提升運算速度。在接下來的例子中也可以看到如何用矩陣乘法實現批次訓練。

使用 pytorch 進行矩陣乘法計算

在 pytorch 中矩陣乘法可以調用 mm 函數：

>>> a = torch.tensor([[1,2,3],[4,5,6]])
>>> b = torch.tensor([[4,3,2,1],[8,7,6,5],[9,9,9,9]])
>>> a.mm(b)
tensor([[ 47,  44,  41,  38],
        [110, 101,  92,  83]])

# 如果大小不匹配會出錯
>>> a = torch.tensor([[1,2,3],[4,5,6]])
>>> b = torch.tensor([[4,3,2,1],[8,7,6,5]])
>>> a.mm(b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: size mismatch, m1: [2 x 3], m2: [2 x 4] at ../aten/src/TH/generic/THTensorMath.cpp:197

# mm 函數也可以用 @ 操作符代替，結果是一樣的
>>> a = torch.tensor([[1,2,3],[4,5,6]])
>>> b = torch.tensor([[4,3,2,1],[8,7,6,5],[9,9,9,9]])
>>> a @ b
tensor([[ 47,  44,  41,  38],
        [110, 101,  92,  83]])

針對更多維度的矩陣乘法，pytorch 提供了 matmul 函數：

# n x m 的矩陣與 q x m x p 的矩陣相乘會得出 q x n x p 的矩陣
>>> a = torch.ones(2,3)
>>> b = torch.ones(5,3,4)
>>> a.matmul(b)
tensor([[[3., 3., 3., 3.],
         [3., 3., 3., 3.]],

        [[3., 3., 3., 3.],
         [3., 3., 3., 3.]],

        [[3., 3., 3., 3.],
         [3., 3., 3., 3.]],

        [[3., 3., 3., 3.],
         [3., 3., 3., 3.]],

        [[3., 3., 3., 3.],
         [3., 3., 3., 3.]]])
>>> a.matmul(b).shape
torch.Size([5, 2, 4])

pytorch 的自動微分功能 (autograd)

pytorch 支持自動微分求導函數值 (即各個參數的梯度)，利用這個功能我們不再需要通過數學公式求各個參數的導函數值，使得機器學習的門檻低了很多😄😄，以下是這個功能的例子：

# 定義參數
# 創建 tensor 對象時設置 requires_grad 為 True 即可開啟自動微分功能
>>> w = torch.tensor(1.0, requires_grad=True)
>>> b = torch.tensor(0.0, requires_grad=True)

# 定義輸入和輸出的 tensor
>>> x = torch.tensor(2)
>>> y = torch.tensor(5)

# 計算預測輸出
>>> p = x * w + b
>>> p
tensor(2., grad_fn=<AddBackward0>)

# 計算損失
# 注意 pytorch 的自動微分功能要求損失不能為負數，因為 pytorch 只會考慮減少損失而不是讓損失接近 0
# 這里用 abs 讓損失變為絕對值
>>> l = (p - y).abs()
>>> l
tensor(3., grad_fn=<AbsBackward>)

# 從損失自動微分求導函數值
>>> l.backward()

# 查看各個參數對應的導函數值
# 注意 pytorch 會假設讓參數減去 grad 的值才能減少損失，所以這里是負數（參數會變大）
>>> w.grad
tensor(-2.)
>>> b.grad
tensor(-1.)

# 定義學習比率，即每次根據導函數值調整參數的比率
>>> learning_rate = 0.01

# 調整參數時需要用 torch.no_grad 來臨時禁止自動微分功能
>>> with torch.no_grad():
...     w -= w.grad * learning_rate
...     b -= b.grad * learning_rate
...

# 我們可以看到 weight 和 bias 分別增加了 0.02 和 0.01
>>> w
tensor(1.0200, requires_grad=True)
>>> b
tensor(0.0100, requires_grad=True)

# 最后我們需要清空參數的 grad 值，這個值不會自動清零（因為某些模型需要疊加導函數值）
# 你可以試試再調一次 backward，會發現 grad 把兩次的值疊加起來
>>> w.grad.zero_()
>>> b.grad.zero_()

我們再來試試前一節提到的讓損失等於相差值平方的方法：

# 定義參數
>>> w = torch.tensor(1.0, requires_grad=True)
>>> b = torch.tensor(0.0, requires_grad=True)

# 定義輸入和輸出的 tensor
>>> x = torch.tensor(2)
>>> y = torch.tensor(5)

# 計算預測輸出
>>> p = x * w + b
>>> p
tensor(2., grad_fn=<AddBackward0>)

# 計算相差值
>>> d = p - y
>>> d
tensor(-3., grad_fn=<SubBackward0>)

# 計算損失 (相差值的平方, 一定會是 0 或者正數)
>>> l = d ** 2
>>> l
tensor(9., grad_fn=<PowBackward0>)

# 從損失自動微分求導函數值
>>> l.backward()

# 查看各個參數對應的導函數值，跟我們上一篇用數學公式求出來的值一樣吧
# w 的導函數值 = 2 * d * x = 2 * -3 * 2 = -12
# b 的導函數值 = 2 * d = 2 * -3 = -6
>>> w.grad
tensor(-12.)
>>> b.grad
tensor(-6.)

# 之后和上一個例子一樣調整參數即可

膩害叭😼，再復雜的模型只要調用 backward 都可以自動幫我們計算出導函數值，從現在開始我們可以把數學課本丟掉了 (這是開玩笑的，一些問題仍然需要用數學來理解，但大部分情況下只有基礎數學知識的人也能玩得起)。

pytorch 的損失計算器封裝 (loss function)

pytorch 提供了幾種常見的損失計算器的封裝，我們最開始看到的也稱 L1 損失 (L1 Loss)，表示所有預測輸出與正確輸出的相差的絕對值的平均 (有的場景會有多個輸出)，以下是使用 L1 損失的例子：

# 定義參數
>>> w = torch.tensor(1.0, requires_grad=True)
>>> b = torch.tensor(0.0, requires_grad=True)

# 定義輸入和輸出的 tensor
# 注意 pytorch 提供的損失計算器要求預測輸出和正確輸出均為浮點數，所以定義輸入與輸出的時候也需要用浮點數
>>> x = torch.tensor(2.0)
>>> y = torch.tensor(5.0)

# 創建損失計算器
>>> loss_function = torch.nn.L1Loss()

# 計算預測輸出
>>> p = x * w + b
>>> p
tensor(2., grad_fn=<AddBackward0>)

# 計算損失
# 等同於 (p - y).abs().mean()
>>> l = loss_function(p, y)
>>> l
tensor(3., grad_fn=<L1LossBackward>)

而計算相差值的平方作為損失稱為 MSE 損失 (Mean Squared Error)，有的地方又稱 L2 損失，以下是使用 MSE 損失的例子：

# 定義參數
>>> w = torch.tensor(1.0, requires_grad=True)
>>> b = torch.tensor(0.0, requires_grad=True)

# 定義輸入和輸出的 tensor
>>> x = torch.tensor(2.0)
>>> y = torch.tensor(5.0)

# 創建損失計算器
>>> loss_function = torch.nn.MSELoss()

# 計算預測輸出
>>> p = x * w + b
>>> p
tensor(2., grad_fn=<AddBackward0>)

# 計算損失
# 等同於 ((p - y) ** 2).mean()
>>> l = loss_function(p, y)
>>> l
tensor(9., grad_fn=<MseLossBackward>)

方便叭🙂️，如果你想看更多的損失計算器可以參考以下地址：

https://pytorch.org/docs/stable/nn.html#loss-functions

pytorch 的參數調整器封裝 (optimizer)

pytorch 還提供了根據導函數值調整參數的調整器封裝，我們在這兩篇文章中看到的方法 (隨機初始化參數值，然后根據導函數值 * 學習比率調整參數減少損失) 又稱隨機梯度下降法 (Stochastic Gradient Descent)，以下是使用封裝好的調整器的例子：

# 定義參數
>>> w = torch.tensor(1.0, requires_grad=True)
>>> b = torch.tensor(0.0, requires_grad=True)

# 定義輸入和輸出的 tensor
>>> x = torch.tensor(2.0)
>>> y = torch.tensor(5.0)

# 創建損失計算器
>>> loss_function = torch.nn.MSELoss()

# 創建參數調整器
# 需要傳入參數列表和指定學習比率，這里的學習比率是 0.01
>>> optimizer = torch.optim.SGD([w, b], lr=0.01)

# 計算預測輸出
>>> p = x * w + b
>>> p
tensor(2., grad_fn=<AddBackward0>)

# 計算損失
>>> l = loss_function(p, y)
>>> l
tensor(9., grad_fn=<MseLossBackward>)

# 從損失自動微分求導函數值
>>> l.backward()

# 確認參數的導函數值
>>> w.grad
tensor(-12.)
>>> b.grad
tensor(-6.)

# 使用參數調整器調整參數
# 等同於:
# with torch.no_grad():
#     w -= w.grad * learning_rate
#     b -= b.grad * learning_rate
optimizer.step()

# 清空導函數值
# 等同於:
# w.grad.zero_()
# b.grad.zero_()
optimizer.zero_grad()

# 確認調整后的參數
>>> w
tensor(1.1200, requires_grad=True)
>>> b
tensor(0.0600, requires_grad=True)
>>> w.grad
tensor(0.)
>>> b.grad
tensor(0.)

SGD 參數調整器的學習比率是固定的，如果我們想在學習過程中自動調整學習比率，可以使用其他參數調整器，例如 Adam 調整器。此外，你還可以開啟沖量 (momentum) 選項改進學習速度，該選項開啟后可以在參數調整時參考前一次調整的方向 (正負)，如果相同則調整更多，而不同則調整更少。

如果你對 Adam 調整器的實現和沖量的實現有興趣，可以參考以下文章 (需要一定的數學知識):

https://mlfromscratch.com/optimizers-explained

如果你想查看 pytorch 提供的其他參數調整器可以訪問以下地址：

https://pytorch.org/docs/stable/optim.html

使用 pytorch 實現上一篇文章的例子

好了，學到這里我們應該對 pytorch 的基本操作有一定了解，現在我們來試試用 pytorch 實現上一篇文章最后的例子。

上一篇文章最后的例子代碼如下：

# 定義參數
weight = 1
bias = 0

# 定義學習比率
learning_rate = 0.01

# 准備訓練集，驗證集和測試集
traning_set = [(2, 5), (5, 11), (6, 13), (7, 15), (8, 17)]
validating_set = [(12, 25), (1, 3)]
testing_set = [(9, 19), (13, 27)]

# 記錄 weight 與 bias 的歷史值
weight_history = [weight]
bias_history = [bias]

for epoch in range(1, 10000):
    print(f"epoch: {epoch}")

    # 根據訓練集訓練並修改參數
    for x, y in traning_set:
        # 計算預測值
        predicted = x * weight + bias
        # 計算損失
        diff = predicted - y
        loss = diff ** 2
        # 打印除錯信息
        print(f"traning x: {x}, y: {y}, predicted: {predicted}, loss: {loss}, weight: {weight}, bias: {bias}")
        # 計算導函數值
        derivative_weight = 2 * diff * x
        derivative_bias = 2 * diff
        # 修改 weight 和 bias 以減少 loss
        # diff 為正時代表預測輸出 > 正確輸出，會減少 weight 和 bias
        # diff 為負時代表預測輸出 < 正確輸出，會增加 weight 和 bias
        weight -= derivative_weight * learning_rate
        bias -= derivative_bias * learning_rate
        # 記錄 weight 和 bias 的歷史值
        weight_history.append(weight)
        bias_history.append(bias)

    # 檢查驗證集
    validating_accuracy = 0
    for x, y in validating_set:
        predicted = x * weight + bias
        validating_accuracy += 1 - abs(y - predicted) / y
        print(f"validating x: {x}, y: {y}, predicted: {predicted}")
    validating_accuracy /= len(validating_set)

    # 如果驗證集正確率大於 99 %，則停止訓練
    print(f"validating accuracy: {validating_accuracy}")
    if validating_accuracy > 0.99:
        break

# 檢查測試集
testing_accuracy = 0
for x, y in testing_set:
    predicted = x * weight + bias
    testing_accuracy += 1 - abs(y - predicted) / y
    print(f"testing x: {x}, y: {y}, predicted: {predicted}")
testing_accuracy /= len(testing_set)
print(f"testing accuracy: {testing_accuracy}")

# 顯示 weight 與 bias 的變化
from matplotlib import pyplot
pyplot.plot(weight_history, label="weight")
pyplot.plot(bias_history, label="bias")
pyplot.legend()
pyplot.show()

使用 pytorch 實現后代碼如下:

# 引用 pytorch
import torch

# 定義參數
weight = torch.tensor(1.0, requires_grad=True)
bias = torch.tensor(0.0, requires_grad=True)

# 創建損失計算器
loss_function = torch.nn.MSELoss()

# 創建參數調整器
optimizer = torch.optim.SGD([weight, bias], lr=0.01)

# 准備訓練集，驗證集和測試集
traning_set = [
    (torch.tensor(2.0), torch.tensor(5.0)),
    (torch.tensor(5.0), torch.tensor(11.0)),
    (torch.tensor(6.0), torch.tensor(13.0)),
    (torch.tensor(7.0), torch.tensor(15.0)),
    (torch.tensor(8.0), torch.tensor(17.0))
]
validating_set = [
    (torch.tensor(12.0), torch.tensor(25.0)),
    (torch.tensor(1.0), torch.tensor(3.0))
]
testing_set = [
    (torch.tensor(9.0), torch.tensor(19.0)),
    (torch.tensor(13.0), torch.tensor(27.0))
]

# 記錄 weight 與 bias 的歷史值
weight_history = [weight.item()]
bias_history = [bias.item()]

for epoch in range(1, 10000):
    print(f"epoch: {epoch}")

    # 根據訓練集訓練並修改參數
    for x, y in traning_set:
        # 計算預測值
        predicted = x * weight + bias
        # 計算損失
        loss = loss_function(predicted, y)
        # 打印除錯信息
        print(f"traning x: {x}, y: {y}, predicted: {predicted}, loss: {loss}, weight: {weight}, bias: {bias}")
        # 從損失自動微分求導函數值
        loss.backward()
        # 使用參數調整器調整參數
        optimizer.step()
        # 清空導函數值
        optimizer.zero_grad()
        # 記錄 weight 和 bias 的歷史值
        weight_history.append(weight.item())
        bias_history.append(bias.item())

    # 檢查驗證集
    validating_accuracy = 0
    for x, y in validating_set:
        predicted = x * weight.item() + bias.item()
        validating_accuracy += 1 - abs(y - predicted) / y
        print(f"validating x: {x}, y: {y}, predicted: {predicted}")
    validating_accuracy /= len(validating_set)

    # 如果驗證集正確率大於 99 %，則停止訓練
    print(f"validating accuracy: {validating_accuracy}")
    if validating_accuracy > 0.99:
        break

# 檢查測試集
testing_accuracy = 0
for x, y in testing_set:
    predicted = x * weight.item() + bias.item()
    testing_accuracy += 1 - abs(y - predicted) / y
    print(f"testing x: {x}, y: {y}, predicted: {predicted}")
testing_accuracy /= len(testing_set)
print(f"testing accuracy: {testing_accuracy}")

# 顯示 weight 與 bias 的變化
from matplotlib import pyplot
pyplot.plot(weight_history, label="weight")
pyplot.plot(bias_history, label="bias")
pyplot.legend()
pyplot.show()

輸出如下:

epoch: 1
traning x: 2.0, y: 5.0, predicted: 2.0, loss: 9.0, weight: 1.0, bias: 0.0
traning x: 5.0, y: 11.0, predicted: 5.659999847412109, loss: 28.515602111816406, weight: 1.1200000047683716, bias: 0.05999999865889549
traning x: 6.0, y: 13.0, predicted: 10.090799331665039, loss: 8.463448524475098, weight: 1.6540000438690186, bias: 0.16679999232292175
traning x: 7.0, y: 15.0, predicted: 14.246713638305664, loss: 0.5674403309822083, weight: 2.0031042098999023, bias: 0.22498400509357452
traning x: 8.0, y: 17.0, predicted: 17.108564376831055, loss: 0.011786224320530891, weight: 2.1085643768310547, bias: 0.24004973471164703
validating x: 12.0, y: 25.0, predicted: 25.33220863342285
validating x: 1.0, y: 3.0, predicted: 2.3290724754333496
validating accuracy: 0.8815345764160156
epoch: 2
traning x: 2.0, y: 5.0, predicted: 4.420266628265381, loss: 0.3360907733440399, weight: 2.0911941528320312, bias: 0.2378784418106079
traning x: 5.0, y: 11.0, predicted: 10.821391105651855, loss: 0.03190113604068756, weight: 2.1143834590911865, bias: 0.24947310984134674
traning x: 6.0, y: 13.0, predicted: 13.04651165008545, loss: 0.002163333585485816, weight: 2.132244348526001, bias: 0.25304529070854187
traning x: 7.0, y: 15.0, predicted: 15.138755798339844, loss: 0.019253171980381012, weight: 2.1266629695892334, bias: 0.25211507081985474
traning x: 8.0, y: 17.0, predicted: 17.107236862182617, loss: 0.011499744839966297, weight: 2.1072371006011963, bias: 0.24933995306491852
validating x: 12.0, y: 25.0, predicted: 25.32814598083496
validating x: 1.0, y: 3.0, predicted: 2.3372745513916016
validating accuracy: 0.8829828500747681
epoch: 3
traning x: 2.0, y: 5.0, predicted: 4.427353858947754, loss: 0.32792359590530396, weight: 2.0900793075561523, bias: 0.24719521403312683
traning x: 5.0, y: 11.0, predicted: 10.82357406616211, loss: 0.0311261098831892, weight: 2.112985134124756, bias: 0.2586481273174286
traning x: 6.0, y: 13.0, predicted: 13.045942306518555, loss: 0.002110695466399193, weight: 2.1306276321411133, bias: 0.26217663288116455
traning x: 7.0, y: 15.0, predicted: 15.137059211730957, loss: 0.018785227090120316, weight: 2.1251144409179688, bias: 0.2612577974796295
traning x: 8.0, y: 17.0, predicted: 17.105924606323242, loss: 0.011220022104680538, weight: 2.105926036834717, bias: 0.2585166096687317
validating x: 12.0, y: 25.0, predicted: 25.324134826660156
validating x: 1.0, y: 3.0, predicted: 2.3453762531280518
validating accuracy: 0.8844133615493774

省略途中的輸出

epoch: 202
traning x: 2.0, y: 5.0, predicted: 4.950470924377441, loss: 0.0024531292729079723, weight: 2.0077908039093018, bias: 0.9348894953727722
traning x: 5.0, y: 11.0, predicted: 10.984740257263184, loss: 0.00023285974748432636, weight: 2.0097720623016357, bias: 0.9358800649642944
traning x: 6.0, y: 13.0, predicted: 13.003972053527832, loss: 1.5777208318468183e-05, weight: 2.0112979412078857, bias: 0.9361852407455444
traning x: 7.0, y: 15.0, predicted: 15.011855125427246, loss: 0.00014054399798624218, weight: 2.0108213424682617, bias: 0.9361057877540588
traning x: 8.0, y: 17.0, predicted: 17.00916290283203, loss: 8.39587883092463e-05, weight: 2.0091617107391357, bias: 0.9358686804771423
validating x: 12.0, y: 25.0, predicted: 25.028034210205078
validating x: 1.0, y: 3.0, predicted: 2.9433810710906982
validating accuracy: 0.9900028705596924
testing x: 9.0, y: 19.0, predicted: 19.004947662353516
testing x: 13.0, y: 27.0, predicted: 27.035730361938477
testing accuracy: 0.9992080926895142

同樣的訓練成功了😼。你可能會發現輸出的值和前一篇文章的值有一些不同，這是因為 pytorch 默認使用 32 位浮點數 (float32) 進行運算，而 python 使用的是 64 位浮點數 (float64), 如果你把參數定義的部分改成這樣：

# 定義參數
weight = torch.tensor(1.0, dtype=torch.float64, requires_grad=True)
bias = torch.tensor(0.0, dtype=torch.float64, requires_grad=True)

然后計算損失的部分改成這樣，則可以得到和前一篇文章一樣的輸出：

# 計算損失
loss = loss_function(predicted, y.double())

使用矩陣乘法實現批次訓練

前面的例子雖然使用 pytorch 實現了訓練，但還是一個一個值的計算，我們可以用矩陣乘法來實現批次訓練，一次計算多個值，以下修改后的代碼：

# 引用 pytorch
import torch

# 定義參數
weight = torch.tensor([[1.0]], requires_grad=True) # 1 行 1 列
bias = torch.tensor(0.0, requires_grad=True)

# 創建損失計算器
loss_function = torch.nn.MSELoss()

# 創建參數調整器
optimizer = torch.optim.SGD([weight, bias], lr=0.01)

# 准備訓練集，驗證集和測試集
traning_set_x = torch.tensor([[2.0], [5.0], [6.0], [7.0], [8.0]]) # 5 行 1 列，代表有 5 組，每組有 1 個輸入
traning_set_y = torch.tensor([[5.0], [11.0], [13.0], [15.0], [17.0]]) # 5 行 1 列，代表有 5 組，每組有 1 個輸出
validating_set_x = torch.tensor([[12.0], [1.0]]) # 2 行 1 列，代表有 2 組，每組有 1 個輸入
validating_set_y = torch.tensor([[25.0], [3.0]]) # 2 行 1 列，代表有 2 組，每組有 1 個輸出
testing_set_x = torch.tensor([[9.0], [13.0]]) # 2 行 1 列，代表有 2 組，每組有 1 個輸入
testing_set_y = torch.tensor([[19.0], [27.0]]) # 2 行 1 列，代表有 2 組，每組有 1 個輸出

# 記錄 weight 與 bias 的歷史值
weight_history = [weight[0][0].item()]
bias_history = [bias.item()]

for epoch in range(1, 10000):
    print(f"epoch: {epoch}")

    # 根據訓練集訓練並修改參數

    # 計算預測值
    # 5 行 1 列的矩陣乘以 1 行 1 列的矩陣，會得出 5 行 1 列的矩陣
    predicted = traning_set_x.mm(weight) + bias
    # 計算損失
    loss = loss_function(predicted, traning_set_y)
    # 打印除錯信息
    print(f"traning x: {traning_set_x}, y: {traning_set_y}, predicted: {predicted}, loss: {loss}, weight: {weight}, bias: {bias}")
    # 從損失自動微分求導函數值
    loss.backward()
    # 使用參數調整器調整參數
    optimizer.step()
    # 清空導函數值
    optimizer.zero_grad()
    # 記錄 weight 和 bias 的歷史值
    weight_history.append(weight[0][0].item())
    bias_history.append(bias.item())

    # 檢查驗證集
    with torch.no_grad(): # 禁止自動微分功能
        predicted = validating_set_x.mm(weight) + bias
        validating_accuracy = 1 - ((validating_set_y - predicted).abs() / validating_set_y).mean()
    print(f"validating x: {validating_set_x}, y: {validating_set_y}, predicted: {predicted}")

    # 如果驗證集正確率大於 99 %，則停止訓練
    print(f"validating accuracy: {validating_accuracy}")
    if validating_accuracy > 0.99:
        break

# 檢查測試集
with torch.no_grad(): # 禁止自動微分功能
    predicted = testing_set_x.mm(weight) + bias
    testing_accuracy = 1 - ((testing_set_y - predicted).abs() / testing_set_y).mean()
print(f"testing x: {testing_set_x}, y: {testing_set_y}, predicted: {predicted}")
print(f"testing accuracy: {testing_accuracy}")

# 顯示 weight 與 bias 的變化
from matplotlib import pyplot
pyplot.plot(weight_history, label="weight")
pyplot.plot(bias_history, label="bias")
pyplot.legend()
pyplot.show()

輸出如下:

epoch: 1
traning x: tensor([[2.],
        [5.],
        [6.],
        [7.],
        [8.]]), y: tensor([[ 5.],
        [11.],
        [13.],
        [15.],
        [17.]]), predicted: tensor([[2.],
        [5.],
        [6.],
        [7.],
        [8.]], grad_fn=<AddBackward0>), loss: 47.79999923706055, weight: tensor([[1.]], requires_grad=True), bias: 0.0
validating x: tensor([[12.],
        [ 1.]]), y: tensor([[25.],
        [ 3.]]), predicted: tensor([[22.0200],
        [ 1.9560]])
validating accuracy: 0.7663999795913696
epoch: 2
traning x: tensor([[2.],
        [5.],
        [6.],
        [7.],
        [8.]]), y: tensor([[ 5.],
        [11.],
        [13.],
        [15.],
        [17.]]), predicted: tensor([[ 3.7800],
        [ 9.2520],
        [11.0760],
        [12.9000],
        [14.7240]], grad_fn=<AddBackward0>), loss: 3.567171573638916, weight: tensor([[1.8240]], requires_grad=True), bias: 0.13199999928474426
validating x: tensor([[12.],
        [ 1.]]), y: tensor([[25.],
        [ 3.]]), predicted: tensor([[24.7274],
        [ 2.2156]])
validating accuracy: 0.8638148307800293

省略途中的輸出

epoch: 1103
traning x: tensor([[2.],
        [5.],
        [6.],
        [7.],
        [8.]]), y: tensor([[ 5.],
        [11.],
        [13.],
        [15.],
        [17.]]), predicted: tensor([[ 4.9567],
        [10.9867],
        [12.9966],
        [15.0066],
        [17.0166]], grad_fn=<AddBackward0>), loss: 0.0004764374461956322, weight: tensor([[2.0100]], requires_grad=True), bias: 0.936755359172821
validating x: tensor([[12.],
        [ 1.]]), y: tensor([[25.],
        [ 3.]]), predicted: tensor([[25.0564],
        [ 2.9469]])
validating accuracy: 0.99001544713974
testing x: tensor([[ 9.],
        [13.]]), y: tensor([[19.],
        [27.]]), predicted: tensor([[19.0265],
        [27.0664]])
testing accuracy: 0.998073160648346

嗯？這回怎么用了 1103 次才訓練成功？這是因為 weight 和 bias 調整的方向始終都是一致的，所以只用一個批次訓練反而會更慢。在之后的文章中，我們會用更多的參數 (神經元) 來訓練，而它們可以有不同的調整方向，所以不會出現這個例子中的問題。當然，業務上有的時候會出現因為參數調整方向全部一致導致訓練很慢，或者根本無法收斂的問題，這個時候我們可以通過更換模型，或者切分多個批次來解決。

划分訓練集，驗證集和測試集的例子

上面的例子定義訓練集，驗證集和測試集的時候都是一個個 tensor 的定義，有沒有覺得很麻煩？我們可以通過 pytorch 提供的 tensor 操作來更方便的划分它們：

# 原始數據集
>>> dataset = [(1, 3), (2, 5), (5, 11), (6, 13), (7, 15), (8, 17), (9, 19), (12, 25), (13, 27)]

# 轉換原始數據集到 tensor，並且指定數值類型為浮點數
>>> dataset_tensor = torch.tensor(dataset, dtype=torch.float32)
>>> dataset_tensor
tensor([[ 1.,  3.],
        [ 2.,  5.],
        [ 5., 11.],
        [ 6., 13.],
        [ 7., 15.],
        [ 8., 17.],
        [ 9., 19.],
        [12., 25.],
        [13., 27.]])

# 給隨機數生成器分配一個初始值，使得每次運行都可以生成相同的隨機數
# 這是為了讓訓練過程可重現，你也可以選擇不這樣做
>>> torch.random.manual_seed(0)
<torch._C.Generator object at 0x10cc03070>

# 生成隨機索引值, 用於打亂數據順序防止分布不均
>>> dataset_tensor.shape
torch.Size([9, 2])
>>> random_indices = torch.randperm(dataset_tensor.shape[0])
>>> random_indices
tensor([8, 0, 2, 3, 7, 1, 4, 5, 6])

# 計算訓練集，驗證集和測試集的索引值列表
# 60 % 的數據划分到訓練集，20 % 的數據划分到驗證集，20 % 的數據划分到測試集
>>> traning_indices = random_indices[:int(len(random_indices)*0.6)]
>>> traning_indices
tensor([8, 0, 2, 3, 7])
>>> validating_indices = random_indices[int(len(random_indices)*0.6):int(len(random_indices)*0.8):]
>>> validating_indices
tensor([1, 4])
>>> testing_indices = random_indices[int(len(random_indices)*0.8):]
>>> testing_indices
tensor([5, 6])

# 划分訓練集，驗證集和測試集
>>> traning_set_x = dataset_tensor[traning_indices][:,:1] # 第一維度不截取，第二維度截取索引值小於 1 的元素
>>> traning_set_y = dataset_tensor[traning_indices][:,1:] # 第一維度不截取，第二維度截取索引值大於或等於 1 的元素
>>> traning_set_x
tensor([[13.],
        [ 1.],
        [ 5.],
        [ 6.],
        [12.]])
>>> traning_set_y
tensor([[27.],
        [ 3.],
        [11.],
        [13.],
        [25.]])
>>> validating_set_x = dataset_tensor[validating_indices][:,:1]
>>> validating_set_y = dataset_tensor[validating_indices][:,1:]
>>> validating_set_x
tensor([[2.],
        [7.]])
>>> validating_set_y
tensor([[ 5.],
        [15.]])
>>> testing_set_x = dataset_tensor[testing_indices][:,:1]
>>> testing_set_y = dataset_tensor[testing_indices][:,1:]
>>> testing_set_x
tensor([[8.],
        [9.]])
>>> testing_set_y
tensor([[17.],
        [19.]])

寫成代碼如下：

# 原始數據集
dataset = [(1, 3), (2, 5), (5, 11), (6, 13), (7, 15), (8, 17), (9, 19), (12, 25), (13, 27)]

# 轉換原始數據集到 tensor
dataset_tensor = torch.tensor(dataset, dtype=torch.float32)

# 給隨機數生成器分配一個初始值，使得每次運行都可以生成相同的隨機數
torch.random.manual_seed(0)

# 切分訓練集，驗證集和測試集
random_indices = torch.randperm(dataset_tensor.shape[0])
traning_indices = random_indices[:int(len(random_indices)*0.6)]
validating_indices = random_indices[int(len(random_indices)*0.6):int(len(random_indices)*0.8):]
testing_indices = random_indices[int(len(random_indices)*0.8):]
traning_set_x = dataset_tensor[traning_indices][:,:1]
traning_set_y = dataset_tensor[traning_indices][:,1:]
validating_set_x = dataset_tensor[validating_indices][:,:1]
validating_set_y = dataset_tensor[validating_indices][:,1:]
testing_set_x = dataset_tensor[testing_indices][:,:1]
testing_set_y = dataset_tensor[testing_indices][:,1:]

注意改變數據分布可以影響訓練速度，你可以試試上面的代碼經過多少次訓練可以訓練成功 (達到 99 % 的正確率)。不過，數據越多越均勻，分布對訓練速度的影響就越少。

定義模型類 (torch.nn.Module)

如果我們想把自己寫好的模型提供給別人用，或者用別人寫好的模型，應該怎么辦呢？pytorch 提供了封裝模型的基礎類 torch.nn.Module，上面例子中的模型可以改寫如下：

# 引用 pytorch 和顯示圖表使用的 matplotlib
import torch
from matplotlib import pyplot

# 定義模型
# 模型需要定義 forward 函數接收輸入並返回預測輸出
# add_history 和 show_history 是自定義函數，它們僅用於幫助我們理解機器學習的原理，實際不需要這樣做
class MyModle(torch.nn.Module):
    def __init__(self):
        # 初始化基類
        super().__init__()
        # 定義參數
        # 需要使用 torch.nn.Parameter 包裝，requires_grad 不需要設置 (會統一幫我們設置)
        self.weight = torch.nn.Parameter(torch.tensor([[1.0]]))
        self.bias = torch.nn.Parameter(torch.tensor(0.0))
        # 記錄 weight 與 bias 的歷史值
        self.weight_history = [self.weight[0][0].item()]
        self.bias_history = [self.bias.item()]

    def forward(self, x):
        # 計算預測值
        predicted = x.mm(self.weight) + self.bias
        return predicted

    def add_history(self):
        # 記錄 weight 和 bias 的歷史值
        self.weight_history.append(self.weight[0][0].item())
        self.bias_history.append(self.bias.item())

    def show_history(self):
        # 顯示 weight 與 bias 的變化
        pyplot.plot(self.weight_history, label="weight")
        pyplot.plot(self.bias_history, label="bias")
        pyplot.legend()
        pyplot.show()

# 創建模型實例
model = MyModle()

# 創建損失計算器
loss_function = torch.nn.MSELoss()

# 創建參數調整器
# 調用 parameters 函數可以自動遞歸獲取模型中的參數列表 (注意是遞歸獲取，嵌套模型也能支持)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# 原始數據集
dataset = [(1, 3), (2, 5), (5, 11), (6, 13), (7, 15), (8, 17), (9, 19), (12, 25), (13, 27)]

# 轉換原始數據集到 tensor
dataset_tensor = torch.tensor(dataset, dtype=torch.float32)

# 給隨機數生成器分配一個初始值，使得每次運行都可以生成相同的隨機數
# 這是為了讓訓練過程可重現，你也可以選擇不這樣做
torch.random.manual_seed(0)

# 切分訓練集，驗證集和測試集
random_indices = torch.randperm(dataset_tensor.shape[0])
traning_indices = random_indices[:int(len(random_indices)*0.6)]
validating_indices = random_indices[int(len(random_indices)*0.6):int(len(random_indices)*0.8):]
testing_indices = random_indices[int(len(random_indices)*0.8):]
traning_set_x = dataset_tensor[traning_indices][:,:1]
traning_set_y = dataset_tensor[traning_indices][:,1:]
validating_set_x = dataset_tensor[validating_indices][:,:1]
validating_set_y = dataset_tensor[validating_indices][:,1:]
testing_set_x = dataset_tensor[testing_indices][:,:1]
testing_set_y = dataset_tensor[testing_indices][:,1:]

# 開始訓練過程
for epoch in range(1, 10000):
    print(f"epoch: {epoch}")

    # 根據訓練集訓練並修改參數
    # 切換模型到訓練模式，將會啟用自動微分，批次正規化 (BatchNorm) 與 Dropout
    model.train()

    # 計算預測值
    predicted = model(traning_set_x)
    # 計算損失
    loss = loss_function(predicted, traning_set_y)
    # 打印除錯信息
    print(f"traning x: {traning_set_x}, y: {traning_set_y}, predicted: {predicted}, loss: {loss}, weight: {model.weight}, bias: {model.bias}")
    # 從損失自動微分求導函數值
    loss.backward()
    # 使用參數調整器調整參數
    optimizer.step()
    # 清空導函數值
    optimizer.zero_grad()
    # 記錄 weight 和 bias 的歷史值
    model.add_history()

    # 檢查驗證集
    # 切換模型到驗證模式，將會禁用自動微分，批次正規化 (BatchNorm) 與 Dropout
    model.eval()
    predicted = model(validating_set_x)
    validating_accuracy = 1 - ((validating_set_y - predicted).abs() / validating_set_y).mean()
    print(f"validating x: {validating_set_x}, y: {validating_set_y}, predicted: {predicted}")

    # 如果驗證集正確率大於 99 %，則停止訓練
    print(f"validating accuracy: {validating_accuracy}")
    if validating_accuracy > 0.99:
        break

# 檢查測試集
predicted = model(testing_set_x)
testing_accuracy = 1 - ((testing_set_y - predicted).abs() / testing_set_y).mean()
print(f"testing x: {testing_set_x}, y: {testing_set_y}, predicted: {predicted}")
print(f"testing accuracy: {testing_accuracy}")

# 顯示 weight 與 bias 的變化
model.show_history()

定義和使用模型類需要注意以下幾點：

必須在構造函數 __init__ 中調用 super().__init__() 初始化基類 (一般 python 繼承類也需要這樣做)
必須定義 forward 函數接收輸入並返回預測輸出
模型中定義參數需要使用 torch.nn.Parameter 包裝，requires_grad 不需要設置 (會統一幫我們設置)
調用 model.parameters() 可以遞歸獲取參數列表 (支持嵌套模型)，創建參數調整器時需要這個參數列表
在訓練前調用 model.train() 開啟自動微分等功能
在驗證或者使用訓練好的模型前調用 model.eval 關閉自動微分等功能

我們在后面繼續使用 pytorch 進行機器學習時，代碼的結構會基本和上面的例子一樣，只是模型和檢查驗證集測試集的部分不同。此外，批次正規化與 Dropout 等功能會在后面的文章中介紹。

本篇就到此結束了，相信看到這里你已經掌握了用 pytorch 進行機器學習的基本模式😼。

寫在最后

本篇介紹的東西也很基礎，但是這些基礎對應用機器學習必不可少。這兩篇的內容在很多機器學習教程中都沒有提到，它們直接就從多層線性模型開始講了，所以很多人會抱怨入門好難😫。如果你看過 pytorch 出的官方書籍《Deep Learning with Pytorch》可能會發現，這兩篇的介紹順序和這本書的介紹順序很接近，是的，寫這兩篇的時候我參考了這本書，按這個順序來理解是最容易的。

下一篇開始將會講解線性模型，激活函數和多層線性模型，並且會給出更接近實際的例子，但可能會需要更多時間，想看的耐心等等叭🙁️。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。