數據集下載地址:
鏈接:https://pan.baidu.com/s/1l1AnBgkAAEhh0vI5_loWKw
提取碼:2xq4
創建數據集:https://www.cnblogs.com/xiximayou/p/12398285.html
讀取數據集:https://www.cnblogs.com/xiximayou/p/12422827.html
進行訓練:https://www.cnblogs.com/xiximayou/p/12448300.html
保存模型並繼續進行訓練:https://www.cnblogs.com/xiximayou/p/12452624.html
加載保存的模型並測試:https://www.cnblogs.com/xiximayou/p/12459499.html
epoch、batchsize、step之間的關系:https://www.cnblogs.com/xiximayou/p/12405485.html
一般來說,數據集都會被划分為三個部分:訓練集、驗證集和測試集。其中驗證集主要是在訓練的過程中觀察整個網絡的訓練情況,避免過擬合等等。
之前我們有了訓練集:20250張,測試集:4750張。本節我們要從訓練集中划分出一部分數據充當驗證集。
之前是在windows下進行划分的,接下來我們要在谷歌colab中進行操作。在utils新建一個文件split.py
import random import os import shutil import glob path='/content/drive/My Drive/colab notebooks/data/dogcat' train_path=path+'/train' val_path=path+'/val' test_path=path+'/test' def split_train_test(fileDir,tarDir): if not os.path.exists(tarDir): os.makedirs(tarDir) pathDir = os.listdir(fileDir) #取圖片的原始路徑 filenumber=len(pathDir) rate=0.1 #自定義抽取圖片的比例,比方說100張抽10張,那就是0.1 picknumber=int(filenumber*rate) #按照rate比例從文件夾中取一定數量圖片 sample = random.sample(pathDir, picknumber) #隨機選取picknumber數量的樣本圖片 print("=========開始移動圖片============") for name in sample: shutil.move(fileDir+name, tarDir+name) print("=========移動圖片完成============") split_train_test(train_path+'/dog/',val_path+'/dog/') split_train_test(train_path+'/cat/',val_path+'/cat/') print("驗證集狗共:{}張圖片".format(len(glob.glob(val_path+"/dog/*.jpg")))) print("驗證集貓共:{}張圖片".format(len(glob.glob(val_path+"/cat/*.jpg")))) print("訓練集狗共:{}張圖片".format(len(glob.glob(train_path+"/dog/*.jpg")))) print("訓練集貓共:{}張圖片".format(len(glob.glob(train_path+"/cat/*.jpg")))) print("測試集狗共:{}張圖片".format(len(glob.glob(test_path+"/dog/*.jpg")))) print("測試集貓共:{}張圖片".format(len(glob.glob(test_path+"/cat/*.jpg"))))
運行結果:

測試集是正確的,訓練集和驗證集和我們預想的咋不一樣?可能谷歌colab不太穩定,造成數據的丟失。就這樣吧,目前我們有這么多數據總不會錯了,這回數據量總不會再變了吧。
為了方便起見,我們再在train目錄下新建一個main.py用於統一處理:
main.py
import sys sys.path.append("/content/drive/My Drive/colab notebooks") from utils import rdata from model import resnet import torch.nn as nn import torch import numpy as np import torchvision import train np.random.seed(0) torch.manual_seed(0) torch.cuda.manual_seed_all(0) torch.backends.cudnn.deterministic = True torch.backends.cudnn.benchmark = True device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') batch_size=128 train_loader,val_loader,test_loader=rdata.load_dataset(batch_size) model =torchvision.models.resnet18(pretrained=False) model.fc = nn.Linear(model.fc.in_features,2,bias=False) model.cuda() #定義訓練的epochs num_epochs=2 #定義學習率 learning_rate=0.01 #定義損失函數 criterion=nn.CrossEntropyLoss() #定義優化方法,簡單起見,就是用帶動量的隨機梯度下降 optimizer = torch.optim.SGD(params=model.parameters(), lr=0.1, momentum=0.9, weight_decay=1*1e-4) print("訓練集有:",len(train_loader.dataset)) print("驗證集有:",len(val_loader.dataset)) def main(): trainer=train.Trainer(criterion,optimizer,model) trainer.loop(num_epochs,train_loader,val_loader) main()
然后是修改了train.py中的代碼
import torch class Trainer: def __init__(self,criterion,optimizer,model): self.criterion=criterion self.optimizer=optimizer self.model=model def loop(self,num_epochs,train_loader,val_loader): for epoch in range(1,num_epochs+1): self.train(train_loader,epoch,num_epochs) self.val(val_loader,epoch,num_epochs) def train(self,dataloader,epoch,num_epochs): self.model.train() with torch.enable_grad(): self._iteration_train(dataloader,epoch,num_epochs) def val(self,dataloader,epoch,num_epochs): self.model.eval() with torch.no_grad(): self._iteration_val(dataloader,epoch,num_epochs) def _iteration_train(self,dataloader,epoch,num_epochs): total_step=len(dataloader) tot_loss = 0.0 correct = 0 for i ,(images, labels) in enumerate(dataloader): images = images.cuda() labels = labels.cuda() # Forward pass outputs = self.model(images) _, preds = torch.max(outputs.data,1) loss = self.criterion(outputs, labels) # Backward and optimizer self.optimizer.zero_grad() loss.backward() self.optimizer.step() tot_loss += loss.data if (i+1) % 2 == 0: print('Epoch: [{}/{}], Step: [{}/{}], Loss: {:.4f}' .format(epoch, num_epochs, i+1, total_step, loss.item())) correct += torch.sum(preds == labels.data).to(torch.float32) ### Epoch info #### epoch_loss = tot_loss/len(dataloader.dataset) print('train loss: {:.4f}'.format(epoch_loss)) epoch_acc = correct/len(dataloader.dataset) print('train acc: {:.4f}'.format(epoch_acc)) if epoch%2==0: state = { 'model': self.model.state_dict(), 'optimizer':self.optimizer.state_dict(), 'epoch': epoch, 'train_loss':epoch_loss, 'train_acc':epoch_acc, } save_path="/content/drive/My Drive/colab notebooks/output/" torch.save(state,save_path+"/"+"epoch"+str(epoch)+"-resnet18-2"+".t7") def _iteration_val(self,dataloader,epoch,num_epochs): total_step=len(dataloader) tot_loss = 0.0 correct = 0 for i ,(images, labels) in enumerate(dataloader): images = images.cuda() labels = labels.cuda() # Forward pass outputs = self.model(images) _, preds = torch.max(outputs.data,1) loss = self.criterion(outputs, labels) tot_loss += loss.data correct += torch.sum(preds == labels.data).to(torch.float32) if (i+1) % 2 == 0: print('Epoch: [{}/{}], Step: [{}/{}], Loss: {:.4f}' .format(1, 1, i+1, total_step, loss.item())) ### Epoch info #### epoch_loss = tot_loss/len(dataloader.dataset) print('val loss: {:.4f}'.format(epoch_loss)) epoch_acc = correct/len(dataloader.dataset) print('val acc: {:.4f}'.format(epoch_acc))
在訓練時是model.train(),同時將代碼放在with torch.enable_grad()中。驗證時是model.eval(),同時將代碼放在with torch.no_grad()中。我們可以通過觀察驗證集的損失、准確率和訓練集的損失、准確率進行相應的調參工作,主要是為了避免過擬合。我們設定每隔2個epoch就保存一次訓練的模型。
讀取數據的代碼我們也要做出相應的修改了:rdata.py
from torch.utils.data import DataLoader import torchvision import torchvision.transforms as transforms import torch def load_dataset(batch_size): #預處理 train_transform = transforms.Compose([transforms.RandomResizedCrop(224),transforms.ToTensor()]) val_transform = transforms.Compose([transforms.Resize((224,224)),transforms.ToTensor()]) test_transform = transforms.Compose([transforms.Resize((224,224)),transforms.ToTensor()]) path = "/content/drive/My Drive/colab notebooks/data/dogcat" train_path=path+"/train" test_path=path+"/test" val_path=path+'/val' #使用torchvision.datasets.ImageFolder讀取數據集指定train和test文件夾 train_data = torchvision.datasets.ImageFolder(train_path, transform=train_transform) train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True, num_workers=6) val_data = torchvision.datasets.ImageFolder(val_path, transform=val_transform) val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True, num_workers=6) test_data = torchvision.datasets.ImageFolder(test_path, transform=test_transform) test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=True, num_workers=6) """ print(train_data.classes) #根據分的文件夾的名字來確定的類別 print(train_data.class_to_idx) #按順序為這些類別定義索引為0,1... print(train_data.imgs) #返回從所有文件夾中得到的圖片的路徑以及其類別 print(test_data.classes) #根據分的文件夾的名字來確定的類別 print(test_data.class_to_idx) #按順序為這些類別定義索引為0,1... print(test_data.imgs) #返回從所有文件夾中得到的圖片的路徑以及其類別 """ return train_loader,val_loader,test_loader
這里我們將batch_size當作參數進行傳入,同時修改了num_works=6(難怪之前第一個epoch訓練測試那么慢),然后對於驗證和測試,數據增強方式與訓練的時候就會不一致了,為了保持原圖像,因此不能進行切割為224,而是要講圖像調整為224×224.。最后返回三個dataloader就行了,因為可以從dataloader.dataset可以獲取到數據的容量大小。
最終結果:
為了再避免數據丟失的問題,我們開始的時候就打印出數據集的大小:
訓練集有: 18255 驗證集有: 2027 Epoch: [1/2], Step: [2/143], Loss: 2.1346 Epoch: [1/2], Step: [4/143], Loss: 4.8034 Epoch: [1/2], Step: [6/143], Loss: 8.4806 Epoch: [1/2], Step: [8/143], Loss: 3.1965 Epoch: [1/2], Step: [10/143], Loss: 1.9405 Epoch: [1/2], Step: [12/143], Loss: 1.8245 Epoch: [1/2], Step: [14/143], Loss: 1.0050 Epoch: [1/2], Step: [16/143], Loss: 0.7030 Epoch: [1/2], Step: [18/143], Loss: 0.8176 Epoch: [1/2], Step: [20/143], Loss: 0.7163 Epoch: [1/2], Step: [22/143], Loss: 1.1955 Epoch: [1/2], Step: [24/143], Loss: 0.7395 Epoch: [1/2], Step: [26/143], Loss: 0.8374 Epoch: [1/2], Step: [28/143], Loss: 1.0237 Epoch: [1/2], Step: [30/143], Loss: 0.7225 Epoch: [1/2], Step: [32/143], Loss: 0.7724 Epoch: [1/2], Step: [34/143], Loss: 1.0290 Epoch: [1/2], Step: [36/143], Loss: 0.8630 Epoch: [1/2], Step: [38/143], Loss: 0.6931 Epoch: [1/2], Step: [40/143], Loss: 0.8261 Epoch: [1/2], Step: [42/143], Loss: 0.6834 Epoch: [1/2], Step: [44/143], Loss: 0.7619 Epoch: [1/2], Step: [46/143], Loss: 0.6832 Epoch: [1/2], Step: [48/143], Loss: 0.7108 Epoch: [1/2], Step: [50/143], Loss: 0.9719 Epoch: [1/2], Step: [52/143], Loss: 0.8093 Epoch: [1/2], Step: [54/143], Loss: 0.8441 Epoch: [1/2], Step: [56/143], Loss: 0.9111 Epoch: [1/2], Step: [58/143], Loss: 0.6936 Epoch: [1/2], Step: [60/143], Loss: 0.8592 Epoch: [1/2], Step: [62/143], Loss: 0.7161 Epoch: [1/2], Step: [64/143], Loss: 0.6975 Epoch: [1/2], Step: [66/143], Loss: 0.6932 Epoch: [1/2], Step: [68/143], Loss: 1.1292 Epoch: [1/2], Step: [70/143], Loss: 0.8269 Epoch: [1/2], Step: [72/143], Loss: 0.7343 Epoch: [1/2], Step: [74/143], Loss: 0.6779 Epoch: [1/2], Step: [76/143], Loss: 0.8384 Epoch: [1/2], Step: [78/143], Loss: 0.7054 Epoch: [1/2], Step: [80/143], Loss: 0.7532 Epoch: [1/2], Step: [82/143], Loss: 0.7620 Epoch: [1/2], Step: [84/143], Loss: 0.7220 Epoch: [1/2], Step: [86/143], Loss: 0.8249 Epoch: [1/2], Step: [88/143], Loss: 0.7050 Epoch: [1/2], Step: [90/143], Loss: 0.7757 Epoch: [1/2], Step: [92/143], Loss: 0.6918 Epoch: [1/2], Step: [94/143], Loss: 0.6893 Epoch: [1/2], Step: [96/143], Loss: 0.7105 Epoch: [1/2], Step: [98/143], Loss: 0.7681 Epoch: [1/2], Step: [100/143], Loss: 0.7826 Epoch: [1/2], Step: [102/143], Loss: 0.6986 Epoch: [1/2], Step: [104/143], Loss: 0.7252 Epoch: [1/2], Step: [106/143], Loss: 0.6829 Epoch: [1/2], Step: [108/143], Loss: 0.6872 Epoch: [1/2], Step: [110/143], Loss: 0.6776 Epoch: [1/2], Step: [112/143], Loss: 0.7574 Epoch: [1/2], Step: [114/143], Loss: 0.7412 Epoch: [1/2], Step: [116/143], Loss: 0.6889 Epoch: [1/2], Step: [118/143], Loss: 0.7476 Epoch: [1/2], Step: [120/143], Loss: 0.6999 Epoch: [1/2], Step: [122/143], Loss: 0.6735 Epoch: [1/2], Step: [124/143], Loss: 0.6929 Epoch: [1/2], Step: [126/143], Loss: 0.6859 Epoch: [1/2], Step: [128/143], Loss: 0.6791 Epoch: [1/2], Step: [130/143], Loss: 0.6922 Epoch: [1/2], Step: [132/143], Loss: 0.7641 Epoch: [1/2], Step: [134/143], Loss: 0.6894 Epoch: [1/2], Step: [136/143], Loss: 0.7030 Epoch: [1/2], Step: [138/143], Loss: 0.6968 Epoch: [1/2], Step: [140/143], Loss: 0.7000 Epoch: [1/2], Step: [142/143], Loss: 0.7290 train loss: 0.0087 train acc: 0.5054 Epoch: [1/1], Step: [2/16], Loss: 0.6934 Epoch: [1/1], Step: [4/16], Loss: 0.6854 Epoch: [1/1], Step: [6/16], Loss: 0.6950 Epoch: [1/1], Step: [8/16], Loss: 0.6894 Epoch: [1/1], Step: [10/16], Loss: 0.6976 Epoch: [1/1], Step: [12/16], Loss: 0.7385 Epoch: [1/1], Step: [14/16], Loss: 0.7118 Epoch: [1/1], Step: [16/16], Loss: 0.7297 val loss: 0.0056 val acc: 0.5067 Epoch: [2/2], Step: [2/143], Loss: 0.7109 Epoch: [2/2], Step: [4/143], Loss: 0.7193 Epoch: [2/2], Step: [6/143], Loss: 0.6891 Epoch: [2/2], Step: [8/143], Loss: 0.6872 Epoch: [2/2], Step: [10/143], Loss: 0.7610 Epoch: [2/2], Step: [12/143], Loss: 0.7611 Epoch: [2/2], Step: [14/143], Loss: 0.7238 Epoch: [2/2], Step: [16/143], Loss: 0.7438 Epoch: [2/2], Step: [18/143], Loss: 0.7789 Epoch: [2/2], Step: [20/143], Loss: 0.7210 Epoch: [2/2], Step: [22/143], Loss: 0.8573 Epoch: [2/2], Step: [24/143], Loss: 0.7694 Epoch: [2/2], Step: [26/143], Loss: 0.7205 Epoch: [2/2], Step: [28/143], Loss: 0.7020 Epoch: [2/2], Step: [30/143], Loss: 0.7191 Epoch: [2/2], Step: [32/143], Loss: 0.7582 Epoch: [2/2], Step: [34/143], Loss: 0.7804 Epoch: [2/2], Step: [36/143], Loss: 0.6864 Epoch: [2/2], Step: [38/143], Loss: 0.6800 Epoch: [2/2], Step: [40/143], Loss: 0.7184 Epoch: [2/2], Step: [42/143], Loss: 0.7476 Epoch: [2/2], Step: [44/143], Loss: 0.6939 Epoch: [2/2], Step: [46/143], Loss: 0.7176 Epoch: [2/2], Step: [48/143], Loss: 0.6927 Epoch: [2/2], Step: [50/143], Loss: 0.7282 Epoch: [2/2], Step: [52/143], Loss: 0.7118 Epoch: [2/2], Step: [54/143], Loss: 0.6974 Epoch: [2/2], Step: [56/143], Loss: 0.7058 Epoch: [2/2], Step: [58/143], Loss: 0.6776 Epoch: [2/2], Step: [60/143], Loss: 0.7171 Epoch: [2/2], Step: [62/143], Loss: 0.7013 Epoch: [2/2], Step: [64/143], Loss: 0.7390 Epoch: [2/2], Step: [66/143], Loss: 0.7126 Epoch: [2/2], Step: [68/143], Loss: 0.6957 Epoch: [2/2], Step: [70/143], Loss: 0.6995 Epoch: [2/2], Step: [72/143], Loss: 0.7181 Epoch: [2/2], Step: [74/143], Loss: 0.7340 Epoch: [2/2], Step: [76/143], Loss: 0.6885 Epoch: [2/2], Step: [78/143], Loss: 0.7061 Epoch: [2/2], Step: [80/143], Loss: 0.6859 Epoch: [2/2], Step: [82/143], Loss: 0.6821 Epoch: [2/2], Step: [84/143], Loss: 0.6963 Epoch: [2/2], Step: [86/143], Loss: 0.6836 Epoch: [2/2], Step: [88/143], Loss: 0.6870 Epoch: [2/2], Step: [90/143], Loss: 0.6957 Epoch: [2/2], Step: [92/143], Loss: 0.6804 Epoch: [2/2], Step: [94/143], Loss: 0.7612 Epoch: [2/2], Step: [96/143], Loss: 0.7005 Epoch: [2/2], Step: [98/143], Loss: 0.7481 Epoch: [2/2], Step: [100/143], Loss: 0.7385 Epoch: [2/2], Step: [102/143], Loss: 0.6914 Epoch: [2/2], Step: [104/143], Loss: 0.7161 Epoch: [2/2], Step: [106/143], Loss: 0.6914 Epoch: [2/2], Step: [108/143], Loss: 0.6862 Epoch: [2/2], Step: [110/143], Loss: 0.7161 Epoch: [2/2], Step: [112/143], Loss: 0.6887 Epoch: [2/2], Step: [114/143], Loss: 0.6848 Epoch: [2/2], Step: [116/143], Loss: 0.6850 Epoch: [2/2], Step: [118/143], Loss: 0.6952 Epoch: [2/2], Step: [120/143], Loss: 0.6888 Epoch: [2/2], Step: [122/143], Loss: 0.7002 Epoch: [2/2], Step: [124/143], Loss: 0.7047 Epoch: [2/2], Step: [126/143], Loss: 0.7086 Epoch: [2/2], Step: [128/143], Loss: 0.6939 Epoch: [2/2], Step: [130/143], Loss: 0.7021 Epoch: [2/2], Step: [132/143], Loss: 0.6865 Epoch: [2/2], Step: [134/143], Loss: 0.6872 Epoch: [2/2], Step: [136/143], Loss: 0.7039 Epoch: [2/2], Step: [138/143], Loss: 0.6865 Epoch: [2/2], Step: [140/143], Loss: 0.6881 Epoch: [2/2], Step: [142/143], Loss: 0.6984 train loss: 0.0056 train acc: 0.5085 Epoch: [1/1], Step: [2/16], Loss: 0.6790 Epoch: [1/1], Step: [4/16], Loss: 0.6794 Epoch: [1/1], Step: [6/16], Loss: 0.6861 Epoch: [1/1], Step: [8/16], Loss: 0.8617 Epoch: [1/1], Step: [10/16], Loss: 0.7011 Epoch: [1/1], Step: [12/16], Loss: 0.6915 Epoch: [1/1], Step: [14/16], Loss: 0.6909 Epoch: [1/1], Step: [16/16], Loss: 0.8612 val loss: 0.0059 val acc: 0.5032
只訓練了2個epoch,現在的目錄結構如下:


通過驗證集調整好參數之后,主要是學習率和batch_size。 然后就可以利用調整好的參數進行邊訓練邊測試了。下一節主要就是加上學習率衰減策略以及加上邊訓練邊測試代碼。在測試的時候會保存准確率最優的那個模型。
