在本demo中,我們使用的二次函數為
其中\(rand\)表示一個滿足標准正態分布\(N\left(0, 1\right)\)的隨機數(平均值為0,方差為1)
基於Parameter手動構建
我們可以嘗試用類似的方式,從參數(Parameter)開始構建一個二次函數。
一個二次函數的基本結構如下
其中包含三個參數:\(a\)、\(b\)、\(c\)。
代碼如下:
import matplotlib.pyplot as plt
import torch
from torch import nn
from torch.autograd import Variable
class SquareRegression(nn.Module):
def __init__(self):
nn.Module.__init__(self)
self.a = nn.Parameter(torch.randn(1, 1), requires_grad=True) # 1 x 1
self.b = nn.Parameter(torch.randn(1, 1), requires_grad=True) # 1 x 1
self.c = nn.Parameter(torch.randn(1, 1), requires_grad=True) # 1 x 1
def forward(self, x_):
p_ = (x_ ** 2).mm(self.a) # n x 1
q_ = x_.mm(self.b) # n x 1
t_ = self.c # 1 x 1
return p_ + q_ + t_.expand_as(p_) # n x 1
if __name__ == "__main__":
n = 100
x = torch.linspace(-2, 12, n).resize_((n, 1)) # n x 1 tensor
y = (x - 4) * (x - 8) + torch.randn(x.size()) # n x 1 tensor
model = SquareRegression()
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=2e-5)
num_epochs = 500000
for epoch in range(num_epochs):
inputs, targets = Variable(x), Variable(y)
out = model(inputs)
loss = criterion(out, targets)
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (epoch + 1) % 100 == 0:
print('Epoch[{}/{}], loss:{:.6f}'.format(epoch + 1, num_epochs, loss.item()))
for name, param in model.named_parameters():
print(name, param.data)
predict = model(x)
plt.plot(x.numpy(), y.numpy(), 'ro', label='Original Data')
plt.plot(x.numpy(), predict.data.numpy(), label='Fitting Line')
plt.show()
輸出結果如下:
Epoch[100/500000], loss:386.833191
Epoch[200/500000], loss:385.401001
Epoch[300/500000], loss:383.982391
Epoch[400/500000], loss:382.576904
Epoch[500/500000], loss:381.184418
Epoch[600/500000], loss:379.804657
Epoch[700/500000], loss:378.437256
Epoch[800/500000], loss:377.082153
Epoch[900/500000], loss:375.738983
Epoch[1000/500000], loss:374.407532
Epoch[1100/500000], loss:373.087585
Epoch[1200/500000], loss:371.778870
Epoch[1300/500000], loss:370.481293
Epoch[1400/500000], loss:369.194580
Epoch[1500/500000], loss:367.918549
... ... ... ...
Epoch[499400/500000], loss:0.985296
Epoch[499500/500000], loss:0.985296
Epoch[499600/500000], loss:0.985296
Epoch[499700/500000], loss:0.985296
Epoch[499800/500000], loss:0.985296
Epoch[499900/500000], loss:0.985296
Epoch[500000/500000], loss:0.985296
a tensor([[1.0048]])
b tensor([[-12.0305]])
c tensor([[31.8826]])
生成圖像如下:
可以發現,我們生成的擬合解如下
所得到的函數為
和原公式\(f(x) = x^2 - 12 x + 32\)已經十分接近,圖像上的擬合結果也與期望結果基本一致。
基於Linear疊加構建
除此之外,我們還可以通過疊加Linear來構建一個二次函數。
本次使用的公式如下:
包含四個參數\(w_1\)、\(b_1\)、\(w_2\)、\(b_2\)。
代碼如下:
import matplotlib.pyplot as plt
import torch
from torch import nn
from torch.autograd import Variable
class SquareRegression(nn.Module):
def __init__(self):
nn.Module.__init__(self)
self.linear1 = nn.Linear(1, 1)
self.linear2 = nn.Linear(1, 1)
def forward(self, x_):
return self.linear2(x_) * self.linear1(x_) # n x 1
if __name__ == "__main__":
n = 100
x = torch.linspace(-2, 12, n).resize_((n, 1)) # n x 1 tensor
y = (x - 4) * (x - 8) + torch.randn(x.size()) # n x 1 tensor
model = SquareRegression()
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=2e-5)
num_epochs = 20000
for epoch in range(num_epochs):
inputs, targets = Variable(x), Variable(y)
out = model(inputs)
loss = criterion(out, targets)
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (epoch + 1) % 100 == 0:
print('Epoch[{}/{}], loss:{:.6f}'.format(epoch + 1, num_epochs, loss.item()))
for name, param in model.named_parameters():
print(name, param.data)
predict = model(x)
plt.plot(x.numpy(), y.numpy(), 'ro', label='Original Data')
plt.plot(x.numpy(), predict.data.numpy(), label='Fitting Line')
plt.show()
輸出結果如下:
Epoch[100/20000], loss:448.569519
Epoch[200/20000], loss:444.616638
Epoch[300/20000], loss:441.241089
Epoch[400/20000], loss:438.343903
Epoch[500/20000], loss:435.845673
Epoch[600/20000], loss:433.682007
Epoch[700/20000], loss:431.799835
Epoch[800/20000], loss:430.154999
Epoch[900/20000], loss:428.709595
Epoch[1000/20000], loss:427.431366
Epoch[1100/20000], loss:426.291718
Epoch[1200/20000], loss:425.265594
Epoch[1300/20000], loss:424.330353
Epoch[1400/20000], loss:423.465973
Epoch[1500/20000], loss:422.654541
... ... ... ...
Epoch[19400/20000], loss:1.044041
Epoch[19500/20000], loss:1.044041
Epoch[19600/20000], loss:1.044041
Epoch[19700/20000], loss:1.044041
Epoch[19800/20000], loss:1.044041
Epoch[19900/20000], loss:1.044041
Epoch[20000/20000], loss:1.044041
linear1.weight tensor([[0.7155]])
linear1.bias tensor([-5.7249])
linear2.weight tensor([[1.4002]])
linear2.bias tensor([-5.6002])
生成圖像如下:
可以發現,我們生成的擬合解如下
所得到的函數為
和原公式\(f(x) = x^2 - 12 x + 32\)已經十分接近,圖像上的擬合結果也與期望結果基本一致。而且訓練速度還得到了大幅度的提高(原本進行了500000次,這次在使用同樣learning rate的情況下,只進行了20000次就達到了類似的效果。
其他
關於Linear
通過閱讀源碼可以發現,Linear內含兩個最為關鍵的參數:weight和bias。一定程度上可以理解為所構建的線性函數公式為
以此類推並擴展,在2輸入(in_features == 2)3輸出(out_features == 3)的時候,公式為:
其中\(w\)和\(b\)都將以Tensor的格式,作為weight和bias進行存儲,前向傳播的時候則會進行矩陣運算並生成計算圖。