首先要熟悉一下怎么使用PyTorch來實現前饋神經網絡吧.為了方便理解,我們這里只拿只有一個隱藏層的前饋神經網絡來舉例:
一個前饋神經網絡的源碼和注釋如下:比較簡單,這里就不多介紹了.
1 class NeuralNet(nn.Module): 2 def __init__(self, input_size, hidden_size, num_classes): 3 super(NeuralNet, self).__init__() 4 self.fc1 = nn.Linear(input_size, hidden_size) //輸入層 5 self.relu = nn.ReLU() //隱藏網絡:elu的功能是將輸入的feature的tensor所有的元素中如果小於零的就取零。 6 self.fc2 = nn.Linear(hidden_size, num_classes) //輸出層 7 8 def forward(self, x): 9 out = self.fc1(x) 10 out = self.relu(out) 11 out = self.fc2(out) 12 return out
下面要看一下怎么調用和使用前饋神經網絡的:為了提高運算效率,要把該網絡優先使用GPU來進行運算.這里的輸入尺寸和隱藏尺寸要和訓練的圖片保持一致的.
# Device configuration device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') model = NeuralNet(input_size, hidden_size, num_classes).to(device)
為了訓練網絡,都需要定義一個loss function來描述模型對問題的求解精度。loss越小,代表模型的結果和真實值偏差越小,這里使用CrossEntropyLoss()來計算.Adam,這是一種基於一階梯度來優化隨機目標函數的算法。詳細的概念和推導我們后續再專門做分析.
criterion = nn.CrossEntropyLoss() //針對單目標分類問題, 結合了 nn.LogSoftmax() 和 nn.NLLLoss() 來計算 loss.
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate) //優化器,設置學習的速度和使用的模型
接下來就是訓練模型了,訓練模型這部分是有點繞的,首先我們來看代碼,后面再針對各個函數做說明:
1 total_step = len(train_loader) 2 for epoch in range(num_epochs): 3 for i, (images, labels) in enumerate(train_loader): 4 # Move tensors to the configured device 5 images = images.reshape(-1, 28*28).to(device) 6 labels = labels.to(device) 7 8 # Forward pass 9 outputs = model(images) 10 loss = criterion(outputs, labels) 11 12 # Backward and optimize 13 optimizer.zero_grad() //把梯度置零,也就是把loss關於weight的導數變成0. 14 loss.backward() 15 optimizer.step()
訓練模型,首先把圖片矩陣變換成25*25的矩陣單元.其次,把運算參數綁定到特定設備上.
然后就是網絡的前向傳播了:
outputs = model(inputs)
然后將輸出的outputs和原來導入的labels作為loss函數的輸入就可以得到損失了:
loss = criterion(outputs, labels)
計算得到loss后就要回傳損失。要注意的是這是在訓練的時候才會有的操作,測試時候只有forward過程。
loss.backward()
回傳損失過程中會計算梯度,然后需要根據這些梯度更新參數,optimizer.step()就是用來更新參數的。optimizer.step()后,你就可以從optimizer.param_groups[0][‘params’]里面看到各個層的梯度和權值信息。
optimizer.step()
測試這個模型,沒有梯度的模型,這樣就大大大額減少了內存的使用量和運算效率,這個測試模型,其實只有一個關鍵的語句就可以預測模型了,那就是:_, predicted = torch.max(outputs.data, 1).
with torch.no_grad(): correct = 0 total = 0 for images, labels in test_loader: images = images.reshape(-1, 28*28).to(device) labels = labels.to(device) outputs = model(images) _, predicted = torch.max(outputs.data, 1) total += labels.size(0) print(labels.size(0)) correct += (predicted == labels).sum().item()
這里有個問題.訓練好的數據怎么和預測聯系起來呢?
訓練輸出的outputs也是torch.autograd.Variable格式,得到輸出后(網絡的全連接層的輸出)還希望能到到模型預測該樣本屬於哪個類別的信息,這里采用torch.max。torch.max()的第一個輸入是tensor格式,所以用outputs.data而不是outputs作為輸入;第二個參數1是代表dim的意思,也就是取每一行的最大值,其實就是我們常見的取概率最大的那個index;第三個參數loss也是torch.autograd.Variable格式。
總體源碼:
1 import torch 2 import torch.nn as nn 3 import torchvision 4 import torchvision.transforms as transforms 5 6 7 # Device configuration 8 device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') 9 10 # Hyper-parameters 11 input_size = 784 12 hidden_size = 500 13 num_classes = 10 14 #input_size = 84 15 #hidden_size = 50 16 #num_classes = 2 17 num_epochs = 5 18 batch_size = 100 19 learning_rate = 0.001 20 21 # MNIST dataset 22 train_dataset = torchvision.datasets.MNIST(root='../../data', 23 train=True, 24 transform=transforms.ToTensor(), 25 download=True) 26 27 test_dataset = torchvision.datasets.MNIST(root='../../data', 28 train=False, 29 transform=transforms.ToTensor()) 30 31 # Data loader 32 train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 33 batch_size=batch_size, 34 shuffle=True) 35 36 test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 37 batch_size=batch_size, 38 shuffle=False) 39 40 # Fully connected neural network with one hidden layer 41 class NeuralNet(nn.Module): 42 def __init__(self, input_size, hidden_size, num_classes): 43 super(NeuralNet, self).__init__() 44 self.fc1 = nn.Linear(input_size, hidden_size) 45 self.relu = nn.ReLU() 46 self.fc2 = nn.Linear(hidden_size, num_classes) 47 48 def forward(self, x): 49 out = self.fc1(x) 50 out = self.relu(out) 51 out = self.fc2(out) 52 return out 53 54 model = NeuralNet(input_size, hidden_size, num_classes).to(device) 55 56 # Loss and optimizer 57 criterion = nn.CrossEntropyLoss() 58 optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate) 59 60 # Train the model 61 total_step = len(train_loader) 62 for epoch in range(num_epochs): 63 for i, (images, labels) in enumerate(train_loader): 64 # Move tensors to the configured device 65 images = images.reshape(-1, 28*28).to(device) 66 labels = labels.to(device) 67 68 # Forward pass 69 outputs = model(images) 70 loss = criterion(outputs, labels) 71 72 # Backward and optimize 73 optimizer.zero_grad() 74 loss.backward() 75 optimizer.step() 76 77 if (i+1) % 100 == 0: 78 print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 79 .format(epoch+1, num_epochs, i+1, total_step, loss.item())) 80 # Test the model 81 # In test phase, we don't need to compute gradients (for memory efficiency) 82 with torch.no_grad(): 83 correct = 0 84 total = 0 85 for images, labels in test_loader: 86 images = images.reshape(-1, 28*28).to(device) 87 labels = labels.to(device) 88 outputs = model(images) 89 _, predicted = torch.max(outputs.data, 1) 90 total += labels.size(0) 91 #print(predicted) 92 correct += (predicted == labels).sum().item() 93 94 print('Accuracy of the network on the 10000 test images: {} %'.format(100 * correct / total)) 95 96 # Save the model checkpoint 97 torch.save(model.state_dict(), 'model.ckpt')
每日一言:人之所畏,不可不畏。
參考文檔:
1 https://blog.csdn.net/fireflychh/article/details/75516165
2 https://blog.csdn.net/kgzhang/article/details/77479737