數據集下載地址:
鏈接:https://pan.baidu.com/s/1l1AnBgkAAEhh0vI5_loWKw
提取碼:2xq4
創建數據集:https://www.cnblogs.com/xiximayou/p/12398285.html
讀取數據集:https://www.cnblogs.com/xiximayou/p/12422827.html
進行訓練:https://www.cnblogs.com/xiximayou/p/12448300.html
保存模型並繼續進行訓練:https://www.cnblogs.com/xiximayou/p/12452624.html
加載保存的模型並測試:https://www.cnblogs.com/xiximayou/p/12459499.html
划分驗證集並邊訓練邊驗證:https://www.cnblogs.com/xiximayou/p/12464738.html
使用學習率衰減策略並邊訓練邊測試:https://www.cnblogs.com/xiximayou/p/12468010.html
epoch、batchsize、step之間的關系:https://www.cnblogs.com/xiximayou/p/12405485.html
我們已經能夠使用學習率衰減策略了,同時也可以訓練、驗證、測試了。那么,我們可能想要了解訓練過程中的損失和准確率的可視化結果。我們可以使用tensorboard來進行可視化。可參考:
利用tensorboard可視化:https://www.cnblogs.com/xiximayou/p/12470678.html
利用tensorboardcolab可視化:https://www.cnblogs.com/xiximayou/p/12470715.html
在此之前,我們還要優化一下我們的訓練測試代碼。一般情況下,我們只需要關注每一個epoch的結果就行了,可以將輸入每一個step的那段代碼注釋掉,但是,這也存在一個問題。每次只打印出epoch的結果,有可能一個epoch要執行的時間很長,注釋掉step之后沒有反饋給到我們。那應該怎么辦?使用python庫tqdm。它會以進度條的形式告訴我們一個epoch還有多久完成,以及完成所需的時間。
接下來,我們結合代碼來一起看看改變之后的結果:
main.py
import sys sys.path.append("/content/drive/My Drive/colab notebooks") from utils import rdata from model import resnet import torch.nn as nn import torch import numpy as np import torchvision import train import torch.optim as optim np.random.seed(0) torch.manual_seed(0) torch.cuda.manual_seed_all(0) torch.backends.cudnn.deterministic = True torch.backends.cudnn.benchmark = True device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') batch_size=128 train_loader,val_loader,test_loader=rdata.load_dataset(batch_size) model =torchvision.models.resnet18(pretrained=False) model.fc = nn.Linear(model.fc.in_features,2,bias=False) model.cuda() #定義訓練的epochs num_epochs=100 #定義學習率 learning_rate=0.1 #定義損失函數 criterion=nn.CrossEntropyLoss() #定義優化方法,簡單起見,就是用帶動量的隨機梯度下降 optimizer = torch.optim.SGD(params=model.parameters(), lr=0.1, momentum=0.9, weight_decay=1*1e-4) scheduler = optim.lr_scheduler.MultiStepLR(optimizer, [40,80], 0.1) print("訓練集有:",len(train_loader.dataset)) #print("驗證集有:",len(val_loader.dataset)) print("測試集有:",len(test_loader.dataset)) def main(): trainer=train.Trainer(criterion,optimizer,model) trainer.loop(num_epochs,train_loader,val_loader,test_loader,scheduler) main()
這里面沒有什么變化。主要是train.py
import torch from tqdm import tqdm from tensorflow import summary import datetime current_time = str(datetime.datetime.now().timestamp()) train_log_dir = '/content/drive/My Drive/colab notebooks/output/tsboardx/train/' + current_time test_log_dir = '/content/drive/My Drive/colab notebooks/output/tsboardx/test/' + current_time val_log_dir = '/content/drive/My Drive/colab notebooks/output/tsboardx/val/' + current_time train_summary_writer = summary.create_file_writer(train_log_dir) val_summary_writer = summary.create_file_writer(val_log_dir) test_summary_writer = summary.create_file_writer(test_log_dir) class Trainer: def __init__(self,criterion,optimizer,model): self.criterion=criterion self.optimizer=optimizer self.model=model def get_lr(self): for param_group in self.optimizer.param_groups: return param_group['lr'] def loop(self,num_epochs,train_loader,val_loader,test_loader,scheduler=None,acc1=0.0): self.acc1=acc1 for epoch in range(1,num_epochs+1): lr=self.get_lr() print("epoch:{},lr:{:.6f}".format(epoch,lr)) self.train(train_loader,epoch,num_epochs) #self.val(val_loader,epoch,num_epochs) self.test(test_loader,epoch,num_epochs) if scheduler is not None: scheduler.step() def train(self,dataloader,epoch,num_epochs): self.model.train() with torch.enable_grad(): self._iteration_train(dataloader,epoch,num_epochs) def val(self,dataloader,epoch,num_epochs): self.model.eval() with torch.no_grad(): self._iteration_val(dataloader,epoch,num_epochs) def test(self,dataloader,epoch,num_epochs): self.model.eval() with torch.no_grad(): self._iteration_test(dataloader,epoch,num_epochs) def _iteration_train(self,dataloader,epoch,num_epochs): total_step=len(dataloader) tot_loss = 0.0 correct = 0 #for i ,(images, labels) in enumerate(dataloader): for images, labels in tqdm(dataloader,ncols=80): images = images.cuda() labels = labels.cuda() # Forward pass outputs = self.model(images) _, preds = torch.max(outputs.data,1) loss = self.criterion(outputs, labels) # Backward and optimizer self.optimizer.zero_grad() loss.backward() self.optimizer.step() tot_loss += loss.data """ if (i+1) % 2 == 0: print('Epoch: [{}/{}], Step: [{}/{}], Loss: {:.4f}' .format(epoch, num_epochs, i+1, total_step, loss.item())) """ correct += torch.sum(preds == labels.data).to(torch.float32) ### Epoch info #### epoch_loss = tot_loss/len(dataloader.dataset) epoch_acc = correct/len(dataloader.dataset) print('train loss: {:.4f},train acc: {:.4f}'.format(epoch_loss,epoch_acc)) with train_summary_writer.as_default(): summary.scalar('loss', epoch_loss.item(), epoch) summary.scalar('accuracy', epoch_acc.item(), epoch) if epoch==num_epochs: state = { 'model': self.model.state_dict(), 'optimizer':self.optimizer.state_dict(), 'epoch': epoch, 'train_loss':epoch_loss, 'train_acc':epoch_acc, } save_path="/content/drive/My Drive/colab notebooks/output/" torch.save(state,save_path+"/resnet18_final"+".t7") def _iteration_val(self,dataloader,epoch,num_epochs): total_step=len(dataloader) tot_loss = 0.0 correct = 0 #for i ,(images, labels) in enumerate(dataloader): for images, labels in tqdm(dataloader,ncols=80): images = images.cuda() labels = labels.cuda() # Forward pass outputs = self.model(images) _, preds = torch.max(outputs.data,1) loss = self.criterion(outputs, labels) tot_loss += loss.data correct += torch.sum(preds == labels.data).to(torch.float32) """ if (i+1) % 2 == 0: print('Epoch: [{}/{}], Step: [{}/{}], Loss: {:.4f}' .format(1, 1, i+1, total_step, loss.item())) """ ### Epoch info #### epoch_loss = tot_loss/len(dataloader.dataset) epoch_acc = correct/len(dataloader.dataset) print('val loss: {:.4f},val acc: {:.4f}'.format(epoch_loss,epoch_acc)) with val_summary_writer.as_default(): summary.scalar('loss', epoch_loss.item(), epoch) summary.scalar('accuracy', epoch_acc.item(), epoch) def _iteration_test(self,dataloader,epoch,num_epochs): total_step=len(dataloader) tot_loss = 0.0 correct = 0 #for i ,(images, labels) in enumerate(dataloader): for images, labels in tqdm(dataloader,ncols=80): images = images.cuda() labels = labels.cuda() # Forward pass outputs = self.model(images) _, preds = torch.max(outputs.data,1) loss = self.criterion(outputs, labels) tot_loss += loss.data correct += torch.sum(preds == labels.data).to(torch.float32) """ if (i+1) % 2 == 0: print('Epoch: [{}/{}], Step: [{}/{}], Loss: {:.4f}' .format(1, 1, i+1, total_step, loss.item())) """ ### Epoch info #### epoch_loss = tot_loss/len(dataloader.dataset) epoch_acc = correct/len(dataloader.dataset) print('test loss: {:.4f},test acc: {:.4f}'.format(epoch_loss,epoch_acc)) with test_summary_writer.as_default(): summary.scalar('loss', epoch_loss.item(), epoch) summary.scalar('accuracy', epoch_acc.item(), epoch) if epoch_acc > self.acc1: state = { "model": self.model.state_dict(), "optimizer": self.optimizer.state_dict(), "epoch": epoch, "epoch_loss": epoch_loss, "epoch_acc": epoch_acc, "acc1": self.acc1, } save_path="/content/drive/My Drive/colab notebooks/output/" print("在第{}個epoch取得最好的測試准確率,准確率為:{:.4f}".format(epoch,epoch_acc)) torch.save(state,save_path+"/resnet18_best"+".t7") self.acc1=max(self.acc1,epoch_acc)
首先關注summary.create_file_writer,這個函數的參數是需要存儲可視化文件的地址,我們這里有train、val、test。然后是
with test_summary_writer.as_default(): summary.scalar('loss', epoch_loss.item(), epoch) summary.scalar('accuracy', epoch_acc.item(), epoch)
這之類的。我們把想要可視化的損失和准確率隨epoch的變化情況傳入到summary.scalar中。summary.scalar接受三個參數,第一個是圖的名稱,第二個是縱坐標,第三個是橫坐標。
之后在test.ipynb中,我們一步步進行操作:
首先進入到train目錄下:
cd /content/drive/My Drive/colab notebooks/train
然后輸入魔法命令:
%load_ext tensorboard.notebook
接着就可以啟動tensorboard了:
%tensorboard --logdir "/content/drive/My Drive/colab notebooks/output/tsboardx/"
啟動之后會在該代碼塊下顯示tensorboard的界面。還沒有開始訓練,所以暫時是看不到變化的。
接下來我們就可以開始訓練了:
!python main.py
這里的結果就只截部分了。我們設定了訓練100個epoch,batchsize設定為128。這里需要說明的是使用大的batchsize的同時要將學習率也設置大些,我們設置初始的學習率為0.1。並在第40個和第80個epoch進行學習率衰減,每次變為原來的0.1唄。也要切記並不是batchsize越大越好,雖然大的batchsize可以加速網絡的訓練,但是會造成內存不足和模型的泛化能力不好。
可以發現我們顯示的界面還是比較美觀的。
最后截圖的是測試准確率最高的那個epoch的結果:
在查看tensorboard之前,我們看下存儲內容的位置。
就是根據標紅的文件中的內容進行可視化的。
最后去看一下tensorboard:
紅線代表測試,藍線代表訓練。
至此,網絡的訓練、測試以及可視化就完成了,接下來是看看整體的目錄結構:
下一節,通過在命令行指定所需的參數,比如batchsize等。