一、概念
1.Numpy里沒有Variable這個概念,如果大家學過TensorFlow就會知道,Variable提供了自動求導的功能。
2.Variable需要放進一個計算圖中,然后進行前后向傳播和自動求導。
3.Variable的屬性有三個:
- data:Variable里Tensor變量的數值
- grad:Variable反向傳播的梯度
- grad_fn:得到Variable的操作
二、Variable的創建和使用
1.我們首先創建一個空的Variable:
import torch #創建Variable
a = torch.autograd.Variable() print(a)
結果如下:
可以看到默認的類型為Tensor
2.那么,我們如果需要給Variable變量賦值,那么就一定是Tensor類型,例如:
b = torch.autograd.Variable(torch.Tensor([[1, 2], [3, 4],[5, 6], [7, 8]])) print(b)
代碼變為:
import torch #創建Variable
a = torch.autograd.Variable() print(a) b = torch.autograd.Variable(torch.Tensor([[1, 2], [3, 4],[5, 6], [7, 8]])) print(b)
結果為:
3.第一章提到了Variable的三個屬性,我們依次打印它們:
import torch #創建Variable
a = torch.autograd.Variable() print(a) b = torch.autograd.Variable(torch.Tensor([[1, 2], [3, 4],[5, 6], [7, 8]])) print(b) print(b.data) print(b.grad) print(b.grad_fn)
結果為:
可以看到data就是Tensor的內容,剩下的兩個屬性為空
三、標量求導計算圖
1.為了方便起見,我們可以將torch.autograd.Variable簡寫為Variable:
from torch.autograd import Variable
2.之后,我們先聲明一個變量x,這里requires_grad=True意義是否對這個變量求梯度,默認的 Fa!se:
x = Variable(torch.Tensor([2]),requires_grad = True) print(x)
代碼變為:
import torch #創建Variable
a = torch.autograd.Variable() print(a) b = torch.autograd.Variable(torch.Tensor([[1, 2], [3, 4],[5, 6], [7, 8]])) print(b) print(b.data) print(b.grad) print(b.grad_fn) #建立計算圖
from torch.autograd import Variable x = Variable(torch.Tensor([2]),requires_grad = True) print(x)
結果為:
3.我們再聲明兩個變量w和b:
w = Variable(torch.Tensor([3]),requires_grad = True) print(w) b = Variable(torch.Tensor([4]),requires_grad = True) print(b)
4.我們再寫兩個變量y1和y2:
y1 = w * x + b print(y1) y2 = w * x + b * x print(y2)
5.我們來計算各個變量的梯度,首先是y1:
#計算梯度
y1.backward() print(x.grad) print(w.grad) print(b.grad)
代碼變為:
import torch #創建Variable
a = torch.autograd.Variable() print(a) b = torch.autograd.Variable(torch.Tensor([[1, 2], [3, 4],[5, 6], [7, 8]])) print(b) print(b.data) print(b.grad) print(b.grad_fn) #建立計算圖
from torch.autograd import Variable x = Variable(torch.Tensor([2]),requires_grad = True) print(x) w = Variable(torch.Tensor([3]),requires_grad = True) print(w) b = Variable(torch.Tensor([4]),requires_grad = True) print(b) y1 = w * x + b print(y1) y2 = w * x + b * x print(y2) #計算梯度
y1.backward() print(x.grad) print(w.grad) print(b.grad)
結果為:
其中:
y1 = 3 * 2 + 4 = 10,
y2 = 3 * 2 + 4 * 2 = 14,
x的梯度是3因為是3 * x,
w的梯度是2因為w * 2,
b的梯度是1因為b * 1(* 1被省略)
6.其次是y2,注銷y1部分:
y2.backward(x) print(x.grad) print(w.grad) print(b.grad) 代碼為: import torch #創建Variable
a = torch.autograd.Variable() print(a) b = torch.autograd.Variable(torch.Tensor([[1, 2], [3, 4],[5, 6], [7, 8]])) print(b) print(b.data) print(b.grad) print(b.grad_fn) #建立計算圖
from torch.autograd import Variable x = Variable(torch.Tensor([2]),requires_grad = True) print(x) w = Variable(torch.Tensor([3]),requires_grad = True) print(w) b = Variable(torch.Tensor([4]),requires_grad = True) print(b) y1 = w * x + b print(y1) y2 = w * x + b * x print(y2) #計算梯度 #y1.backward() #print(x.grad) #print(w.grad) #print(b.grad)
y2.backward() print(x.grad) print(w.grad) print(b.grad)
結果為:
其中:
x的梯度是7因為是3 * x + 4 * x,
w的梯度是2因為w * 2,
b的梯度是2因為b * 2
7.backward的函數可以填入參數,例如我們填入變量a:
a = Variable(torch.Tensor([5]),requires_grad = True) y2.backward(a) print(x.grad) print(w.grad) print(b.grad)
代碼變為:
import torch #創建Variable
a = torch.autograd.Variable() print(a) b = torch.autograd.Variable(torch.Tensor([[1, 2], [3, 4],[5, 6], [7, 8]])) print(b) print(b.data) print(b.grad) print(b.grad_fn) #建立計算圖
from torch.autograd import Variable x = Variable(torch.Tensor([2]),requires_grad = True) print(x) w = Variable(torch.Tensor([3]),requires_grad = True) print(w) b = Variable(torch.Tensor([4]),requires_grad = True) print(b) y1 = w * x + b print(y1) y2 = w * x + b * x print(y2) #計算梯度 #y1.backward() #print(x.grad) #print(w.grad) #print(b.grad)
a = Variable(torch.Tensor([5]),requires_grad = True) y2.backward(a) print(x.grad) print(w.grad) print(b.grad)
結果為:
可以看到x,w,b的梯度乘以了a的值5,說明這個填入參數是梯度的系數。
四、矩陣求導計算圖
1.例如:
#矩陣求導
c = torch.randn(3) print(c) c = Variable(c,requires_grad = True) print(c) y3 = c * 2
print(y3) y3.backward(torch.FloatTensor([1, 0.1, 0.01])) print(c.grad)
代碼變為:
import torch #創建Variable
a = torch.autograd.Variable() print(a) b = torch.autograd.Variable(torch.Tensor([[1, 2], [3, 4],[5, 6], [7, 8]])) print(b) print(b.data) print(b.grad) print(b.grad_fn) #建立計算圖
from torch.autograd import Variable x = Variable(torch.Tensor([2]),requires_grad = True) print(x) w = Variable(torch.Tensor([3]),requires_grad = True) print(w) b = Variable(torch.Tensor([4]),requires_grad = True) print(b) y1 = w * x + b print(y1) y2 = w * x + b * x print(y2) #計算梯度 #y1.backward() #print(x.grad) #print(w.grad) #print(b.grad)
a = Variable(torch.Tensor([5]),requires_grad = True) y2.backward(a) print(x.grad) print(w.grad) print(b.grad) #矩陣求導
c = torch.randn(3) print(c) c = Variable(c,requires_grad = True) print(c) y3 = c * 2
print(y3) y3.backward(torch.FloatTensor([1, 0.1, 0.01])) print(c.grad)
結果為:
可以看到,c是一個1行3列的矩陣,因為y3 = c * 2,因此如果backward()里的參數為:
torch.FloatTensor([1, 1, 1])
則就是每個分量的梯度,但是傳入的是:
torch.FloatTensor([1, 0.1, 0.01])
則每個分量梯度要分別乘以1,0.1和0.01
五、Variable放到GPU上執行
1.和Tensor一樣的道理,代碼如下:
#Variable放在GPU上
if torch.cuda.is_available(): d = c.cuda() print(d)
代碼變為:
import torch #創建Variable
a = torch.autograd.Variable() print(a) b = torch.autograd.Variable(torch.Tensor([[1, 2], [3, 4],[5, 6], [7, 8]])) print(b) print(b.data) print(b.grad) print(b.grad_fn) #建立計算圖
from torch.autograd import Variable x = Variable(torch.Tensor([2]),requires_grad = True) print(x) w = Variable(torch.Tensor([3]),requires_grad = True) print(w) b = Variable(torch.Tensor([4]),requires_grad = True) print(b) y1 = w * x + b print(y1) y2 = w * x + b * x print(y2) #計算梯度 #y1.backward() #print(x.grad) #print(w.grad) #print(b.grad)
a = Variable(torch.Tensor([5]),requires_grad = True) y2.backward(a) print(x.grad) print(w.grad) print(b.grad) #矩陣求導
c = torch.randn(3) print(c) c = Variable(c,requires_grad = True) print(c) y3 = c * 2
print(y3) y3.backward(torch.FloatTensor([1, 0.1, 0.01])) print(c.grad) #Variable放在GPU上
if torch.cuda.is_available(): d = c.cuda() print(d)
2.生成結果會慢一下,然后可以看到多了一個device=‘cuda:0’和grad_fn=<CopyBackwards>
六、Variable轉Numpy與Numpy轉Variable
1.值得注意的是,Variable里requires_grad 一般設置為 False,代碼中為True則:
#變量轉Numpy
e = Variable(torch.Tensor([4]),requires_grad = True) f = e.numpy() print(f)
會報如下錯誤:
Can't call numpy() on Variable that requires grad. Use var.detach().numpy() instead.
2.解決方法1:requires_grad改為False后,可以看到最后一行的Numpy類型的矩陣[4.]:
3.解決方法2::將numpy()改為detach().numpy(),可以看到最后一行的Numpy類型的矩陣[4.]
#變量轉Numpy
e = Variable(torch.Tensor([4]),requires_grad = True) f = e.detach().numpy() print(f)
代碼變為:
import torch #創建Variable
a = torch.autograd.Variable() print(a) b = torch.autograd.Variable(torch.Tensor([[1, 2], [3, 4],[5, 6], [7, 8]])) print(b) print(b.data) print(b.grad) print(b.grad_fn) #建立計算圖
from torch.autograd import Variable x = Variable(torch.Tensor([2]),requires_grad = True) print(x) w = Variable(torch.Tensor([3]),requires_grad = True) print(w) b = Variable(torch.Tensor([4]),requires_grad = True) print(b) y1 = w * x + b print(y1) y2 = w * x + b * x print(y2) #計算梯度 #y1.backward() #print(x.grad) #print(w.grad) #print(b.grad)
a = Variable(torch.Tensor([5]),requires_grad = True) y2.backward(a) print(x.grad) print(w.grad) print(b.grad) #矩陣求導
c = torch.randn(3) print(c) c = Variable(c,requires_grad = True) print(c) y3 = c * 2
print(y3) y3.backward(torch.FloatTensor([1, 0.1, 0.01])) print(c.grad) #Variable放在GPU上
if torch.cuda.is_available(): d = c.cuda() print(d) #變量轉Numpy
e = Variable(torch.Tensor([4]),requires_grad = True) f = e.detach().numpy() print(f)
結果為:
4.Numpy轉Variable先是轉為Tensor再轉為Variable:
#轉換為Tensor
g = torch.from_numpy(f) print(g) #轉換為Variable
g = Variable(g,requires_grad = True) print(g)
代碼變為:
import torch #創建Variable
a = torch.autograd.Variable() print(a) b = torch.autograd.Variable(torch.Tensor([[1, 2], [3, 4],[5, 6], [7, 8]])) print(b) print(b.data) print(b.grad) print(b.grad_fn) #建立計算圖
from torch.autograd import Variable x = Variable(torch.Tensor([2]),requires_grad = True) print(x) w = Variable(torch.Tensor([3]),requires_grad = True) print(w) b = Variable(torch.Tensor([4]),requires_grad = True) print(b) y1 = w * x + b print(y1) y2 = w * x + b * x print(y2) #計算梯度 #y1.backward() #print(x.grad) #print(w.grad) #print(b.grad)
a = Variable(torch.Tensor([5]),requires_grad = True) y2.backward(a) print(x.grad) print(w.grad) print(b.grad) #矩陣求導
c = torch.randn(3) print(c) c = Variable(c,requires_grad = True) print(c) y3 = c * 2
print(y3) y3.backward(torch.FloatTensor([1, 0.1, 0.01])) print(c.grad) #Variable放在GPU上
if torch.cuda.is_available(): d = c.cuda() print(d) #變量轉Numpy
e = Variable(torch.Tensor([4]),requires_grad = True) f = e.detach().numpy() print(f) #轉換為Tensor
g = torch.from_numpy(f) print(g) #轉換為Variable
g = Variable(g,requires_grad = True) print(g)
結果為:
七、Variable總結
1.Variable和Tensor本質上沒有區別,不過Variable會被放入一個計算圖中,然后進行前向傳播,反向傳播,自動求導。
2.Variable有三個屬性,可以通過構造函數結構求取梯度得到grad值和grad_fn值
3.Variable,Tensor和Numpy互相轉化很方便,類型也比較兼容